Skip to content

a benchmark dataset for training and evaluating global cloud classification models.

Notifications You must be signed in to change notification settings

wzl1360917/CUMULO

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

a benchmark dataset for training and evaluating global cloud classification models. It merges two satellite products from the A-train constellation: the Moderate Resolution Imaging Spectroradiometer (MODIS) from Aqua satellite and the 2B-CLDCLASS-LIDAR product derived from the combination of CloudSat Cloud Profiling Radar (CPR) and CALIPSO Cloud‐Aerosol Lidar with Orthogonal Polarization (CALIOP).

FULL README

Dataset

The dataset is hosted here. It contains over 100k annotated multispectral images at 1km x 1km resolution, providing daily full coverage of the Earth for 2008. Years 2009 and 2016 are coming soon.

Download

Option 1: syncing with your DropBox Account

  1. add CUMULO to your DropBox account
  2. use rclone for syncing it on your machine

Option 2: direct download

  1. use one these download scripts

File Format

Data is stored in Network Common Data Form (NetCDF) following this convention.

There is 1 NetCDF file per swath of 1354x2030 pixels, 1 every 5 minutes, named:

filename = AYYYYDDD.HHMM.nc

YYYY => year
DDD => absolute day since 01.01.2008 
HH => hour of day
MM => minutes    

File Content

To see the variables available for a netcdf file and their description, run:

ncdump -h netcdf/cumulo.nc

Code Source

  1. The script pipeline.py extracts one CUMULO's swath (as a netcdf file) from the corresponding MODIS' MYD02, MYD03, MYD06 and MYD35 files, and CloudSat's CS_2B-CLDCLASS and/or CS_2B-CLDCLASS-LIDAR files.
python3 pipeline <save-dir> <myd02-filename>
  1. src/ contains the code source for extracting the different CUMULO's features, for alignment them and for completing the missing values when possible.

Dependencies

pip install gcsfs
conda install -c conda-forge pyhdf  #The pip install's wheels are broken at time of writing
pip install satpy
pip install satpy[modis_l1b]
pip install -r requirements.txt

Cite

If you find this work useful, please cite the original paper:

@article{zantedeschi2019cumulo,
        title={Cumulo: A Dataset for Learning Cloud Classes},
        author={Zantedeschi, Valentina and Falasca, Fabrizio and Douglas, Alyson and Strange, Richard and Kusner, Matt J and Watson-Parris, Duncan},
        journal={arXiv preprint arXiv:1911.04227},
        year={2019}}

Acknowledgments

This work is the result of the 2019 ESA Frontier Development Lab Atmospheric Phenomena and Climate Variability challenge. We are grateful to all organisers, mentors and sponsors for providing us this opportunity. We thank Google Cloud for providing computing and storage resources to complete this work.

About

a benchmark dataset for training and evaluating global cloud classification models.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 83.3%
  • Python 16.7%