icon_globe

Download and plot ICON data over the global domain.

In the following repository I include a fully-functional suite of scripts needed to download, merge and plot data from the ICON model, which is freely available at https://opendata.dwd.de/weather/.

The main script to be called (possibly through cronjob) is copy_data.run. There, the current run version is determined, and files are downloaded from the DWD server. CDO is used to merge the files. At the end of the process one single NETCDF file with all the timesteps for every variable is created. We keep these files separated and merge them whe necessary in Python.

Installation

This is not package! It is just a collection of scripts which can be run from copy_data.run. It was tested on Linux and MacOS; it will not run on Windows since it uses bash. To install it just clone the folder.

You need the following UNIX utilities to run the main script

GNU parallel to parallelize the download and processing of data
ncftp to upload pictures to FTP
cdo for the preprocessing
wget to download the files
bzip2 to decompress the downloaded files

The python installation can be re-created with the up-to-date requirements.txt. The script was succesfully tested on both python 2.7.15 and python 3.7.8. The 2.7 version for now is the most stable.

The most important packages to have installed are

numpy
pandas
metpy
xarray
dask
basemap
matplotlib
seaborn
scipy
geopy

Inputs to be defined

Most of the inputs needed to run the code are contained at the beginning of the main bash script copy_data.run. In particular MODEL_DATA_FOLDER where the processing is done (downloading of files and creation of pictures). NCFTP_BOOKMARK is the FTP bookmark to be defined in ncftp so that user and password don't need to be entered every time. We use a conda environment with all the packages needed to run the download/processing of the data so that we can easily run into crontab without the need to load any additional packages.

Parallelized donwload of data

Downloading and merging the data is one of the process that can take more time depending on the connection. For this reason this is fully parallelized making use of the GNU parallel utility.

# 2-D variables
variables=("PMSL" "CLCT" "TOT_PREC" "VMAX_10M" "RAIN_CON" "RAIN_GSP" "SNOW_CON" "SNOW_GSP" "T_2M" "TD_2M" "WW" "CAPE_CON")
${parallel} -j ${N_CONCUR_PROCESSES} download_merge_2d_variable_icon_globe ::: "${variables[@]}"

# 3-D variables on pressure levels
variables=("T" "FI" "RELHUM" "U" "V")
${parallel} -j ${N_CONCUR_PROCESSES} download_merge_3d_variable_icon_globe ::: "${variables[@]}"

The list of variables to download using such parallelization is provided as bash array. 2-D and 3-D variables have different routines: these are all defined in the common library functions_download_dwd.sh. The link to the DWD opendata server is also defined in this file.

Parallelized plotting

Plotting of the data is done using Python, but anyone could potentially use other software. This is also parallelized given that plotting routines are the most expensive part of the whole script and can take a lot of time (up to 2 hours depending on the load). This is make especially easier by the fact that the plotting scripts can be given as argument the projection so we can parallelize across multiple projections and script files, for example:

scripts=("plot_jetstream.py" "plot_rain_acc.py" "plot_clouds.py" "plot_geop_500.py" "plot_mslp_wind.py" "plot_thetae.py")
projections=("world" "nh" "us")

${parallel} -j 6 ${python} ::: "${scripts[@]}" ::: "${projections[@]}"

Furthermore in every individual python script a parallelization using multiprocessing.Pool over chunks of the input timesteps is performed. This means that, using the same ${N_CONCUR_PROCESSES}, different plotting istances will act over chunks of 10 timesteps each to speed up the processes. The chunk size can be changed in utils.py. NOTE Depending on what is passed to multiprocessing.Pool.map in args you could get an error since some objects cannot be pickled. Make sure that you're passing only the necessary arrays for the plotting and not additional objects (e.g. pint arrays created by metpy may be the culprit of the error).

Note that every Python script used for plotting has an option debug=True to allow some testing of the script before pushing it to production. When this option is activated the PNG figures will not be produced and the script will not be parallelized. Instead just 1 timestep will be processed and the figure will be shown in a window using the matplotlib backend.

Upload of the pictures

PNG pictures are uploaded to a FTP server defined in ncftp bookmarks. This operation is NOT parallelized because the FTP server may not allow concurrent connections.

Additional files

ICON invariant data are automatically download by download_invariant_icon_globe.

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
.ipynb_checkpoints		.ipynb_checkpoints
grids		grids
plotting		plotting
.gitignore		.gitignore
README.md		README.md
TODO.md		TODO.md
benchmarks.md		benchmarks.md
copy_data.run		copy_data.run
cron.tab		cron.tab
functions_download_dwd.sh		functions_download_dwd.sh
get_last_run.py		get_last_run.py
grid.cdo		grid.cdo
plot_data.ipynb		plot_data.ipynb

guidocioni/icon_globe

Folders and files

Latest commit

History

Repository files navigation

icon_globe

Installation

Inputs to be defined

Parallelized donwload of data

Parallelized plotting

Upload of the pictures

Additional files

About

Resources

Stars

Watchers

Forks

Languages