WildfiresPilot

This project is an extension to GBDX Notebook Post-Fire Damage Assessment from High Resolution Imagery. This project will provide the following:

Segment image of wildfire and convert to geojson
Script to convert geojson to features for RandomForestClassifier
Script to train the RandomForestClassifier and optionally use hyperparameter tuning

Getting Started

To start, you will need to install Anaconda for Python Notebook with GBDXtools virtualenv or you can work in GBDX Notebooks. To find how to set up gbdxtools locally follow this link: https://github.com/GeoBigData/gbdx-training/tree/master/gbdxtools_module

I also recommend the following:

Python 2.7.15
Pycharm or preferred IDE
QGIS 2.18 or your preferred GIS software
IDAHO Layer QGIS Plugin

Creating Geojsons

Follow my Python Notebook Creating segmented geojson to see how this is done. You may run through it and try out your own CatalogImage.

This Notebook will give you multiple geojsons, so before heading to next steps, run this with the script Combine_geojsons.py to combine geojsons to one.

python Combine_geojsons.py <directory_path_to_geojsons>

After you have created the geojson, you can optionally view it in QGIS and edit the file to your needs. Relabel polygons as necessary. Use the IDAHO Layer QGIS Plugin to view the geojson layer on the TMS of your wildfire. To find your IDAHO layer check out this link and search for your CatlogImage: https://idaho.geobigdata.io/.

Convert Geojson to Features

This script is used to take the labelled geojson and convert to Numpy array containing the dataset of RSI features. You can also give it a zipped geojson if your remote repository doesn't have capacity for the raw geojson data. The Numpy arrays will be saved out to your specified output directory.

python 1_geojson_to_feats.py -i <path_to_geojson> -o <output_directory> --yes

For help on the options, run the following:

python 1_geojson_to_feats.py --help

Random Forest Model Training

This next script will load the Numpy arrays and train the default RandomForestClassifier model from Scikit-learn. Optionally, you can toggle for Randomized search of Grid search for hyperparameter tuning.

Below is an example of the options to run with the script:

python 2_train_model.py -i <directory_path_to_numpy_feats> -o <output_model_path/model_name.pkl> --modeltype default --yes

For help on the options, run the following:

python 2_train_model.py --help

Inspiration for Hyperparameter Tuning

The usage of random search and grid search is proposed by William Koehrsen in his article, Hyperparameter Tuning the Random Forest in Python.

RandomSearchCV

Lines 55-72 is the grid used for RandomSearchCV. RandomSearchCV will randomly pick the features in this grid to test. There are a total of 7,020,000 parameter combinations, but specificying the number of iterations will only randomly select a few of these.

Line 75 is the StratifiedKFold for partitioning the dataset. The default is 3. This splits data into 2 training sets and 1 test set. Each set is stratified so that each partition replicates the overall dataset. ie. if 20% of the labels are False, then in each partition 20% will be false.

Line 76 instantiates RandomizedSearchCV. This takes the parameter grid and n_iters specifies the number of random parameters to train/test. ie. if n_iters = 1000, then only 1000 parameter combinations out of 7,020,000 will be randomly tried.

Example to run random search:

python 2_train_model.py -i <directory_path_to_numpy_feats> -o <output_model_path/model_name.pkl> --modeltype random --testsize 0.3 --n_iters 1000 --yes

GridSearchCV

Because we don't exactly want to try all possible parameter candidates, I have used RandomSearch to specify the top 10 candidates with the best parameters. cv_results_ provide the candidates and their parameters. You can see this in lines 106-108.

Lines 111-113 is the parameter grid for the grid search. By default, 10 n_estimator parameters will be selected between the minimum n_estimator and maximum n_estimator from the top 10 candidates produced by RandomizedSearchCV. And same follows for the 10 max_depth parameters. By default, this makes 100 parameter combinations.

Line 116 instantiates the GridSearchCV. This uses the parameter grid produced by RandomSearchCV, and uses the top candidate for max_features, min_samples_split, min_samples_leaf, and bootstrap. Given K-fold, this will create 100xk trials for the grid search. ie. if k=10 then there will be 1,000 parameter combinations that are tried.

Example to run grid search:

python 2_train_model.py -i <directory_path_to_numpy_feats> -o <output_model_path/model_name.pkl> --modeltype 'grid' --testsize 0.3 --k 10 --n_estm_grid 10 --n_max_depth 10 --yes

Authors

Ai-Linh Alten - Initial work - aalten77

Acknowledgments

mjgleason's code for remote sensing indices and gabor filters: https://github.com/GeoBigData/nbfirerisk
Will Koehrsen's random forest hyperparameter tuning walkthrough: https://github.com/WillKoehrsen/Machine-Learning-Projects/tree/master/random_forest_explained

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
.idea		.idea
data/Tubbs/Fountaingrove		data/Tubbs/Fountaingrove
test		test
.gitignore		.gitignore
1_geojson_to_feats.py		1_geojson_to_feats.py
2_train_model.py		2_train_model.py
Combine_geojsons.py		Combine_geojsons.py
Creating segmented geojson.ipynb		Creating segmented geojson.ipynb
README.md		README.md
Wildfires Locations, BBoxes, Segments.html		Wildfires Locations, BBoxes, Segments.html
gridsearch.py		gridsearch.py
readjsonfromlocal.py		readjsonfromlocal.py
readjsonfromurl.py		readjsonfromurl.py
requirements.txt		requirements.txt

aalten77/WildfiresPilot

Folders and files

Latest commit

History

Repository files navigation

WildfiresPilot

Getting Started

Creating Geojsons

Convert Geojson to Features

Random Forest Model Training

Inspiration for Hyperparameter Tuning

RandomSearchCV

GridSearchCV

Authors

Acknowledgments

About

Resources

Stars

Watchers

Forks

Languages