GitHub - fmidev/sasse-polygon-process: SASSE II (ERA5) polygon processes including clustering, tracking and feature extraction

This repository is used to create training dataset and train a power outage model. Operative running code is in another repository. This code is open to support an related article. The data is, however, propriety data of the power distribution companies.

The overall process is following:

Fetch and prepare necessary data
Identify and track storm objects
Extract predictive features
Train classifier
Classify

Fetch and prepare necessary data

Data consists of three parts: 1) ERA5 2) Luke forest inventory 3) power outage data.

ERA5

ERA5 is fetched from https://cds.climate.copernicus.eu/ and is stored in AWS (private) bucket fmi-era5-world-nwp-parameters.

Luke forest inventory

Luke forest inventory data is fetched from http://kartta.luke.fi/opendata/valinta.html and stored in (private) bucket fmi-asi-data-puusto. The data is stored as original (16m resolution) and lowres (1.6 km resoltion) GeoTiff and lowres GRIB files.

Following actions can be used to fetch new versions of the files:

Fetch the tiles from the service and upload them to the bucket with correct name (i.e. luke/2017/fra_luokka/xx.tif)
Create composite GeoTiff: bin/process_tree_files.sh fra_luokka) (for all parameters)
Convert to GRIB: bin/geotiff_to_grib (process all parameters)

Power outage data

Power outage data is received from power distribution companies.

Track and extract predictive features

See https://github.com/fmidev/sasse-era5-smartmet-grid

Classifier

Dataset preparattion

Dataset is split to train and testset separately from training classifier. This ensures fair comparison and enable creating test examples. The split is conducted in notebook dataset_split.

Train Classifier

Training the classifier is done partly with script classifier/train_classifier.py and partly in notebooks train_and_validate_rfc (random search of the best hyperparameters), and train_and_validate_mlp.

The script reads relevant config as arguments and from cnf/options.ini. One can set config file and config section name as arguments config_filename and config_name respectively. Relevant setups at the moment are thin and thin_energiateollisuus. Train and test data are always set as arguments train_data and test_data. If files do not exist locally, the script tries to fetch them from AWS bucket listed in cnf/options.ini variable s3_bucket. model (script supports svct, rfc, gnb, and gp). dataset argument is used to format model and results output path.

In practice, the script is ran with docker-compose. To train for example energiateollisuus dataset with 20 m/s threshold, one could use following commands:

export model=rfc
export dataset=national_random_20
export train_data=data/energiateollisuus_random_20_thin_res.csv
export test_data=data/energiateollisuus_random_20_thin_test.csv
export config_name=thin_energiateollisuus
docker-compose run --rm cl

Classifying storm objects

For operational use, see https://github.com/fmidev/contour-storms

classifier/create_examples.py is used to create examples. Consult source code and docker-compose file for more details.

Name		Name	Last commit message	Last commit date
Latest commit History 144 Commits
bin		bin
classifier		classifier
cnf		cnf
create_dataset		create_dataset
lib		lib
notebooks		notebooks
tracker		tracker
.dockerignore		.dockerignore
.gitignore		.gitignore
README.md		README.md
classify_dockerfile		classify_dockerfile
docker-compose.yml		docker-compose.yml

fmidev/sasse-polygon-process

Folders and files

Latest commit

History

Repository files navigation

Fetch and prepare necessary data

ERA5

Luke forest inventory

Power outage data

Track and extract predictive features

Classifier

Dataset preparattion

Train Classifier

Classifying storm objects

About

Resources

Stars

Watchers

Forks

Languages