Configuration
Last updated on 2024-11-26 | Edit this page
Estimated time: 20 minutes
Overview
Questions
- What is the user configuration file and how should I use it?
Objectives
- Understand the contents of the user-config.yml file
- Prepare a personalized user-config.yml file
- Configure ESMValTool to use some settings
The configuration file
For the purposes of this tutorial, we will create a directory in our
home directory called esmvaltool_tutorial
and use that as
our working directory. The following steps should do that:
The config-user.yml
configuration file contains all the
global level information needed by ESMValTool to run. This is a YAML file.
You can get the default configuration file by running:
The default configuration file will be downloaded to the directory
specified with the --path
variable. For instance, you can
provide the path to your working directory as the
target_dir
. If this option is not used, the file will be
saved to the default location:
~/.esmvaltool/config-user.yml
, where ~
is the
path to your home directory. Note that files and directories starting
with a period are “hidden”, to see the .esmvaltool
directory in the terminal use ls -la ~
. Note that if a
configuration file by that name already exists in the default location,
the get_config_user
command will not update the file as
ESMValTool will not overwrite the file. You will have to move the file
first if you want an updated copy of the user configuration file.
We run a text editor called nano
to have a look inside
the configuration file and then modify it if needed:
Any other editor can be used, e.g.vim.
This file contains the information for:
- Output settings
- Destination directory
- Auxiliary data directory
- Number of tasks that can be run in parallel
- Rootpath to input data
- Directory structure for the data from different projects
Text editor side note
No matter what editor you use, you will need to know where it
searches for and saves files. If you start it from the shell, it will
(probably) use your current working directory as its default location.
We use nano
in examples here because it is one of the least
complex text editors. Press ctrl + O to save the
file, and then ctrl + X to exit
nano
.
Output settings
The configuration file starts with output settings that inform
ESMValTool about your preference for output. You can turn on or off the
setting by true
or false
values. Most of these
settings are fairly self-explanatory.
Saving preprocessed data
Later in this tutorial, we will want to look at the contents of the
preproc
folder. This folder contains preprocessed data and
is removed by default when ESMValTool is run. In the configuration file,
which settings can be modified to prevent this from happening?
If the option remove_preproc_dir
is set to
false
, then the preproc/
directory contains
all the pre-processed data and the metadata interface files. If the
option save_intermediary_cubes
is set to true
then data will also be saved after each preprocessor step in the folder
preproc
. Note that saving all intermediate results to file
will result in a considerable slowdown, and can quickly fill your
disk.
Destination directory
The destination directory is the rootpath where ESMValTool will store its output folders containing e.g. figures, data, logs, etc. With every run, ESMValTool automatically generates a new output folder determined by recipe name, and date and time using the format: YYYYMMDD_HHMMSS.
Set the destination directory
Let’s name our destination directory esmvaltool_output
in the working directory. ESMValTool should write the output to this
path, so make sure you have the disk space to write output to this
directory. How do we set this in the config-user.yml
?
Rootpath to input data
ESMValTool uses several categories (in ESMValTool, this is referred to as projects) for input data based on their source. The current categories in the configuration file are mentioned below. For example, CMIP is used for a dataset from the Climate Model Intercomparison Project whereas OBS may be used for an observational dataset. More information about the projects used in ESMValTool is available in the [documentation](https://docs.esmvaltool.org/projects/esmvalcore/en/latest/ quickstart/find_data.html). When using ESMValTool on your own machine, you can create a directory to download climate model data or observation data sets and let the tool use data from there. It is also possible to ask ESMValTool to download climate model data as needed. This can be done by specifying a download directory and by setting the option to download data as shown below.
YAML
# Directory for storing downloaded climate data
download_dir: ~/climate_data
search_esgf: always
If you are working offline or do not want to download the data then
set the option above to never
. If you want to download data
only when the necessary files are missing at the usual location, you can
set the option to when_missing
.
The rootpath
specifies the directories where ESMValTool
will look for input data. For each category, you can define either one
path or several paths as a list. For example:
YAML
rootpath:
CMIP5: [~/cmip5_inputpath1, ~/cmip5_inputpath2]
OBS: ~/obs_inputpath
RAWOBS: ~/rawobs_inputpath
default: ~/climate_data
These are typically available in the default configuration file you downloaded, so simply removing the machine specific lines should be sufficient to access input data.
- Are you working on your own local machine? You need to add the root
path of the folder where the data is available to the
config-user.yml
file as:
- Are you working on your local machine and have downloaded data using
ESMValTool? You need to add the root path of the folder where the data
has been downloaded to as specified in the
download_dir
.
- Are you working on a computer cluster like Jasmin or DKRZ?
Site-specific path to the data for JASMIN/DKRZ/ETH/IPSL are already
listed at the end of the
config-user.yml
file. You need to uncomment the related lines. For example, on JASMIN:
YAML
auxiliary_data_dir: /gws/nopw/j04/esmeval/aux_data/AUX
rootpath:
CMIP6: /badc/cmip6/data/CMIP6
CMIP5: /badc/cmip5/data/cmip5/output1
OBS: /gws/nopw/j04/esmeval/obsdata-v2
OBS6: /gws/nopw/j04/esmeval/obsdata-v2
obs4MIPs: /gws/nopw/j04/esmeval/obsdata-v2
ana4mips: /gws/nopw/j04/esmeval/obsdata-v2
default: /gws/nopw/j04/esmeval/obsdata-v2
- For more information about setting the rootpath, see also the ESMValTool [documentation](https://docs.esmvaltool.org/projects/esmvalcore/en/latest/ quickstart/find_data.html).
Directory structure for the data from different projects
Input data can be from various models, observations and reanalysis
data that adhere to the CF/CMOR
standard. The drs
setting describes the file
structure.
The drs
setting describes the file structure for several
projects (e.g. CMIP6, CMIP5, obs4mips, OBS6, OBS) on several key
machines (e.g. BADC, CP4CDS, DKRZ, ETHZ, SMHI, BSC). For more
information about drs
, you can visit the ESMValTool
documentation on [Data Reference Syntax (DRS)](https://docs.esmvaltool.org/projects/esmvalcore/
en/latest/quickstart/find_data.html#cmor-drs).
- Are you working on your own local machine? You need to set the
drs
of the data in theconfig-user.yml
file as:
- Are you asking ESMValTool to download the data for use with your
diagnostics? You need to set the
drs
of the data in theconfig-user.yml
file as:
- Are you working on a computer cluster like Jasmin or DKRZ?
Site-specific
drs
of the data are already listed at the end of theconfig-user.yml
file. You need to uncomment the related lines. For example, on Jasmin:
Explain the default drs (if working on local machine)
- In the previous exercise, we set the
drs
of CMIP5 data todefault
. Can you explain why? - Have a look at the directory structure of the
OBS
data. There is a folder calledTier1
. What does it mean?
drs: default
is one way to retrieve data from a ROOT directory that has no DRS-like structure.default
indicates that all the files are in a folder without any structure.Observational data are organized in Tiers depending on their level of public availability. Therefore the default directory must be structured accordingly with sub-directories
TierX
e.g. Tier1, Tier2 or Tier3, even whendrs: default
. More details can be found in the [documentation](https://docs.esmvaltool.org/projects/esmvalcore/en/latest/ quickstart/find_data.html#observational-data).
Other settings
Auxiliary data directory
The auxiliary_data_dir
setting is the path where any
required additional auxiliary data files are stored. This location
allows us to tell the diagnostic script where to find the files if they
can not be downloaded at runtime. This option should not be used for
model or observational datasets, but for data files (e.g. shape files)
used in plotting such as coastline descriptions and if you want to feed
some additional data (e.g. shape files) to your recipe.
See more information in ESMValTool [document](https://docs.esmvaltool.org/projects/ESMValCore/en/latest/quickstart /configure.html?highlight=auxiliary_data#user-configuration-file).
Number of parallel tasks
This option enables you to perform parallel processing. You can
choose the number of tasks in parallel as 1/2/3/4/… or you can set it to
null
. That tells ESMValTool to use the maximum number of
available CPUs. For the purpose of the tutorial, please set ESMValTool
use only 1 cpu:
In general, if you run out of memory, try setting
max_parallel_tasks
to 1. Then, check the amount of memory
you need for that by inspecting the file
run/resource_usage.txt
in the output directory. Using the
number there you can increase the number of parallel tasks again to a
reasonable number for the amount of memory available in your system.
Make your own configuration file
It is possible to have several configuration files with different purposes, for example: config-user_formalised_runs.yml, config-user_debugging.yml. In this case, you have to pass the path of your own configuration file as a command-line option when running the ESMValTool. We will learn how to do this in the next lesson.
Key Points
- The
config-user.yml
tells ESMValTool where to find input data. -
output_dir
defines the destination directory. -
rootpath
defines the root path of the data. -
drs
defines the directory structure of the data.