- Gabriel Hayat
- Mounir Amrani
- Philipe Andreu
- Emilien Pilloud
Install virtualenvwrapper if not already done:
pip install virtualenvwrapper
Create a new virtual environment:
mkvirtualenv -p python3.6 "env-name"
The virtual environment is by default activated. You can disable it and enable it using:
deactivate
workon "env-name"
Install pip requirements:
pip install -r requirements.txt
To remove the virtual environment, simply disable it as shown above then run:
rmvirtualenv "env-name"
Please put the image folders labeled
, query
and scored
and the csv files labeled.csv
and scored.csv
under the same directory called data
that should be in the same directory as the *.py
scripts.
Then, to generate the 38 manual features for all the images in the labeled
, query
and scored
datasets, please run:
python3 generate_feats.py
This will generate a new folder called features
under data
in which you can find the features files and the corresponding ids files (used to map each feature vector to the id of the corresponding image). The final directory structure should be:
File | Description |
---|---|
data | Data folder. |
├ labeled | Labeled Image Directory |
├ scored | Scored Image Directory |
├ query | Query Image Directory |
├ labeled.csv | Labeled Images' Labels |
├ scored.csv | Scored Images' Scores |
└ features | Features folder. |
├ labeled_feats.gz | Labeled Features |
├ labeled_feats_ids.gz | Labeled Features ID Correspondences |
├ query_feats.gz | Query Features |
├ query_feats_ids.gz | Query Features ID Correspondences |
├ scored_feats.gz | Scored Features |
└ scored_feats_ids.gz | Scored Features ID Correspondences |
The model definition can be found in DCGAN.py
. To train the model with the default parameters, simply run:
python3 train_DCGAN.py
You can optionnally add the options -ls
, -mb
, -rot
to use label smoothing, minibatch discrimination and data augmentation with rotation respectively
This generates a LOG_DCGAN
folder with the following structure:
File | Description |
---|---|
LOG_DCGAN | DCGAN logs folder. |
└ [date-time] | Date and time of the run |
├ checkpoints | Directory containing the last 5 checkpoints saved |
├ code.zip | Zip file containing the code used for the run |
├ output | Messages printed to standard output |
└ test_samples | Directory containing the test sample images generated during training |
The model definition can be found in MCGAN.py
. To train the model with the default parameters, simply run:
python3 train_MCGAN.py
You can optionnally add the options -ls
, -mb
, -rot
to use label smoothing, minibatch discrimination and data augmentation with rotation respectively
This generates a LOG_MCGAN
folder with the following structure:
File | Description |
---|---|
LOG_MCGAN | MCGAN logs folder. |
└ [date-time] | Date and time of the run |
├ checkpoints | Directory containing the last 5 checkpoints saved |
├ code.zip | Zip file containing the code used for the run |
├ output | Messages printed to standard output |
└ test_samples | Directory containing the test sample images generated during training |
The model definition can be found in StackedSRM.py
. To train the model with the default parameters, simply run:
python3 train_stackedSRM.py
Please make sure that you have enough GPU memory to run the model to avoid memory exceptions towards the end of training. Alternatively you can download the log directory containing model checkpoint (trained on Google Colab's Tesla T4 GPU with 16GB of memory for around 3h) from the following link:
https://polybox.ethz.ch/index.php/s/c7DHpbXWf76EDp1
Please extract the archive file and put the LOG_SRM
folder in the same directory as the *.py
scripts.
When running the model training, this generates a LOG_SRM
folder with the following structure:
File | Description |
---|---|
LOG_SRM | SRM logs folder. |
└ [date-time] | Date and time of the run |
├ checkpoints | Directory containing the last 5 checkpoints saved |
├ code.zip | Zip file containing the code used for the run |
├ output | Messages printed to standard output |
└ samples | Directory containing samples of the output of the SRM model on the training data. |
The model definition can be found in FullresGAN.py
. To train the model with the default parameters, simply run:
python3 train_FullresGAN.py
You can optionnally add the options -ls
, -mb
to use label smoothing and minibatch discrimination respectively
This generates a LOG_FullresGAN
folder with the following structure:
File | Description |
---|---|
LOG_FullresGAN | FullresGAN logs folder. |
└ [date-time] | Date and time of the run |
├ checkpoints | Directory containing the last 5 checkpoints saved |
├ code.zip | Zip file containing the code used for the run |
├ output | Messages printed to standard output |
└ test_samples | Directory containing the test sample images generated during training |
In order to train the Regressors based on manually extracted features, you can simply run:
python3 baseline_score_regressor.py --regressor_type <reg_name>
Replace <reg_name> with one of the following arguments for different regressors:
Argument | Description |
---|---|
Boost | XGBoost Regressor |
Ridge | Ridge Regressor |
Random_Forest | Random Forest Regressor |
This generates a Regressor
folder with the following structure:
File | Description |
---|---|
Regressor | Main folder. |
└ [reg_name] | Trained Regressor Name |
├ checkpoints | Directory containing the checkpoints saved |
└ predictions | |
└ predictions.csv | The Regressor's predictions on the query set |
The model definition can be found in DCGAN_Scorer.py
. This model adds and trains a simple Feed-Forward Neural Network to the discriminator of the DCGAN model. So, since this model depends on a trained DCGAN model, please make sure to train the DCGAN model first then train the scoring head. This model also assumes no mini-batch discrimination was in the trained DCGAN. We trained this model on the DCGAN without the options -ls
, -mb
, -rot
.
To train this model using the latest trained DCGAN model and the default parameters, simply run:
python3 train_DCGAN_for_score.py
This generates a LOG_DCGAN_SCORER
folder with the following structure:
File | Description |
---|---|
LOG_DCGAN_SCORER | Main folder. |
└ [date-time] | Date and time of the run |
├ checkpoints | Directory containing the last 5 checkpoints saved |
├ code.zip | Zip file containing the code used for the run |
└ output | Messages printed to standard output |
Once the training done, you can generate score predictions on the query
dataset by running:
python3 test_DCGAN_scorer.py
This would load the latest trained DCGAN Scorer model and generate a predictions
folder containing the predictions.csv
file. The directory structure is as follows:
File | Description |
---|---|
LOG_DCGAN_SCORER | Main folder. |
└ [date-time] | Date and time of the run of the loaded model |
└ predictions | Directory containing the predictions |
└ [date-time] | Date and time of the score predictions generation |
└ predictions.csv | Score predictions file |
To run this model, simply run:
python3 baseline_patches.py
This generates a LOG_PATCHES
folder with the following structure:
File | Description |
---|---|
LOG_PATCHES | Patches baseline logs folder. |
└ [date-time] | Date and time of the run |
├ output | Messages printed to standard output |
└ generated_samples | Directory containing the generated samples |
└ 1000 | Directory containing generated images of size 1000x1000 |
You can use the code in test_GAN_SRM_Scorer.py
to execute the generation pipeline: this consits in generating an image using a GAN model, then upsampling the image using the SRM model (in case the generated image is 64x64), then optionally scoring the image with a chosen scoring model in order to filter or keep it. By default 100 images are generated.
There are 2 strategies for deciding whether or not to keep an image when using a scorer:
- Keeping only images with a score above a certain threshold
t
(by default 3.0). - Scoring the images of the
labeled
dataset that represent galaxies (i,e labeled1
), then taking the mean of these scores and adding a marginm
(by default 0.25) to this mean in order to get a thresholdt
. If an image has a score belowt
, it is filtered, otherwise it is kept.
To run the generation pipeline, simply run:
python3 test_GAN_SRM_Scorer.py --generator <GENERATOR> --scorer <SCORER> --use_threshold --threshold <THRESHOLD>
The option --scorer
is optional and can be ommited.
To use a margin on the mean score of the labeled
galaxy images, replace --use_threshold
by --use_margin
and --threshold <THRESHOLD>
by --margin <MARGIN>
.
To view the list of allowed values for <GENERATOR> and <SCORER>, please run:
python3 test_GAN_SRM_Scorer.py --help
The models that are loaded are the latest trained ones.
The generated images are stored in the folder LOG_COMBINED
with the following structure:
File | Description |
---|---|
LOG_COMBINED | Logs folder. |
└ [date-time] | Date and time of the run |
├ output | Messages printed to standard output |
└ generated_samples | Directory containing the generated samples |
├ 64 | Directory containing generated images of size 64x64 (if model generates 64x64 images) |
└ 1000 | Directory containing generated images of size 1000x1000 |
To run experiments, images must have been generated and placed in the
./generated_images
folder with the following hierarchy structure:
File | Description |
---|---|
generated_images | Generated Images folder. |
├ legend.json | (Optional) Specifies correspondences between feature indices and feature names |
├ model_1 | Directory with the name of the model |
├ 64 | Folder with the 64x64 images generated (optional if not generated) |
└ 1000 | Folder with the 1000x1000 images generated |
├ model_2 | Directory with the name of the model |
├ 64 | Folder with the 64x64 images generated (optional if not generated) |
└ 1000 | Folder with the 1000x1000 images generated |
├ ... |
Compute the manual features on the generated images of all models under the folder ./generated_images
by running:
python3 extract_features.py
The results are stored in ./manual_features
folder with a structure similar to the ./generated_images
folder:
File | Description |
---|---|
manual_features | Generated Images folder. |
├ model_1 | Directory with the name of the model |
├ 64 | Folder with the features extracted on 64x64 images |
└ 1000 | Folder with the features extracted on 1000x1000 images |
├ model_2 | Directory with the name of the model |
├ 64 | Folder with the features extracted on 64x64 images |
└ 1000 | Folder with the features extracted on 1000x1000 images |
├ ... |
You can generate scores for 1000x1000 images of models under the folder ./generated_images
by running:
python3 baseline_score_generated.py --regressor_type <reg_name>
where <reg_name> is one of the manual feature regressors.
The results are stored in ./generated_images/model_name
for each available model_name
.
You can generate scores for the images of the labeled
dataset that represent galaxies (i,e labeled 1.0) by running:
python3 baseline_score_labeled.py --regressor_type <reg_name>
where <reg_name> is one of the manual feature regressors.
The results are stored in the current working directory.
Now that the directories are setup and features extracted, all that remains is to run all experiments using:
python3 run_experiments.py -all
An optional path to a legend json file can be specified using the option --legend <FILE_PATH>
to specify a dict from feature index to feature name. Otherwise, the feature index will be used as feature name.
The experiment results can be found uder the directory ./experiments_results
by default.
The above instructions should be sufficient to reproduce our models and experiments.
We note that for many *.py
files, you can get a some help information about the possible settable options and their descriptions by simply running:
python3 [filename] --help
- Manual Features Generation on
labeled
,scored
andquery
datasets: ~1h35min - Generation with patches baseline: ~5min
- Training:
- DCGAN: ~1h20min on Leonhard's GTX 1080 Ti GPU.
- MCGAN: ~1h20min on Leonhard's GTX 1080 Ti GPU.
- Stacked SRM: ~3h on Google Colab’s Tesla T4 GPU.
- FullresGan: ~1h30min on Leonhard's GTX 1080 Ti GPU.
- DCGAN scrorer: ~2h45min on Leonhard's GTX 1080 Ti GPU ( around 1h45min on a local GTX 970 GPU + SSD)
- XGBoost regressor: ~6min
- Random Forest regressor: ~16min
- Ridge regressor: ~1sec
File | Description |
---|---|
baseline_patches.py | Generative Model based on patches |
baseline_score_regressor.py | Train on the scored set and predict on the query set using a manual features regressor |
baseline_score_regressor_test.py | Predict scores on images in a provided directory using a trained manual features regressor |
baseline_score_generated.py | Creates the *.csv file on images in ./generated_images/model_name/1000 for each model_name using a trained manual features regressor |
baseline_score_labeled.py | Creates the *.csv file on galaxy images of the labeled dataset using trained manual features regressor |
downsample.py | Downsamples images in a given directory of 1000x1000 images down to 64x64 |
extract_features.py | Extracts manually crafted features from the images in ./generated_images/model_name for each model_name |
generate_feats.py | Generates manually crafted features on labeled , scored and query datasets |
data.py | Image/Manual Feature loading and preprocessing |
layers.py | Layers and Blocks used to build the Models |
utils.py | Functions used for manual features extraction and for experiments |
tools.py | Logger to log output to both terminal and file and some utility functions |
run_experiments.py | Run experiments on images in ./generated_images/model_name for each model_name |
DCGAN.py | Deep Convolutional Generative Adversarial Network Model |
DCGAN_Scorer.py | Scoring Model based on the Discriminator of DCGAN model |
MCGAN.py | Manual Feature Conditionned Generative Adversarial Network |
FullresGAN.py | Full 1000x1000 resolution Generative Adversial Network |
StackedSRM.py | Stacked SuperResolution Model |
train_DCGAN.py | Training File for the DCGAN |
train_DCGAN_for_score.py | Training file for the DCGAN based scoring model |
train_MCGAN.py | Training File for the MCGAN |
train_stackedSRM.py | Training File for the Stacked SRM |
train_FullresGAN.py | Training file for the FullresGAN model |
test_DCGAN_scorer.py | Predict scores using the DCGAN based scoring model on the query set |
test_GAN_SRM_Scorer.py | Image Generation using a GAN model with possible scorer filtering |
requirements.txt | List of dependencies |
README.md | README file |