Q-NAS

Neural Architecture Search using the Q-NAS algorithm and Tensorflow.

This repository contains code for the works presented in the following papers:

D. Szwarcman, D. Civitarese and M. Vellasco, "Quantum-Inspired Neural Architecture Search,"
2019 International Joint Conference on Neural Networks (IJCNN), Budapest, Hungary, 2019, pp. 1-8. DOI

D. Szwarcman, D. Civitarese and M. Vellasco, "Q-NAS Revisited: Exploring Evolution Fitness to Improve Efficiency," 2019 8th Brazilian Conference on Intelligent Systems (BRACIS), Salvador, Brazil, 2019, pp . 509-514. DOI

Requirements

The required python packages are listed below. Make sure you have openmpi (https://www.open-mpi.org/) installed before installing the project requirements. The program was tested using openmpi-3.1.1.

The specific versions that we used in our runs are:

pyyaml==3.13
numpy==last version
mpi4py==3.0.0
tensorflow==2.3.0
psultil

All of our runs were executed in a multi-computer environment, with NVIDIA K80 GPUs and Power8 processors running Linux (Red Hat Enterprise Linux 7.4 3).

Running Q-NAS

The entire process is divided in 3 steps (1 python script for each):

Dataset preparation
Run architecture search
Retrain final architecture

Optionally, the user can run the script run_profiling.py to get the number of parameters and FLOPs of one of the discovered architectures.

1. Dataset Preparation

The user can choose to work with one of these datasets: CIFAR-10 or CIFAR-100 (dataset details here).

The script run_dataset_prep.py prepares the dataset, as described in the papers, to use for the architecture search and/or retraining. If the original CIFAR is already downloaded, just point to the folder with the files using the parameter --data_path. Otherwise, the script will download it for you and save it in the location defined by --data_path.

Here's an example of how to prepare the CIFAR-10 dataset limited to 10k examples for training and validation:

python run_dataset_prep.py \
    --data_path cifar10 \
    --output_folder cifar_tfr_10000 \
    --num_classes 10 \
    --limit_data 10000

At the end of the process, the folder cifar10/cifar_tfr_10000 has the following files:

cifar_train_mean.npz
data_info.txt
test_1.tfrecords
train_1.tfrecords
valid_1.tfrecords

The tfrecords files contains the images and labels, data_info.txt includes basic information about this dataset, and cifar_train_mean.npz is the numpy array with the mean of the training images.

This example shows how to prepare the CIFAR-100 dataset, with all the available training examples:

python run_dataset_prep.py \
    --data_path cifar100 \
    --output_folder cifar_tfr \
    --num_classes 100 \
    --label_mode fine \
    --limit_data 0

Run python run_dataset_prep.py --help for additional parameter details.

2. Run architecture search

All the configurable parameters to run the architecture search with Q-NAS are set in a yaml configuration file.
This file sets 2 groups of parameters, namely: QNAS (parameters related to the evolution itself) and train
(parameters related to the training session conducted to evaluate the architectures). The following template shows the type and meaning of each parameter in the configuration file:

QNAS:
    crossover_rate:      (float) crossover rate [0.0, 1.0]
    max_generations:     (int) maximum number of generations to run the algorithm
    max_num_nodes:       (int) maximum number of nodes in the network
    num_quantum_ind:     (int) number of quantum individuals
    penalize_number:     (int) maximum number of reducing layers in the networks without penalization
    repetition:          (int) number of classical individuals each quantum individual will generate
    replace_method:      (str) selection mechanism; 'best' or 'elitism'
    update_quantum_gen:  (int) generation frequency to update quantum genes
    update_quantum_rate: (float) rate for the quantum update (similar to crossover rate)
    save_data_freq:      (int) generation frequency to save train data of best model of current 
                           generation in csv files; if user do not want this, set it to 0. 

    params_ranges:       # Set to *value* if user wants this value instead of evolving the parameter
        decay:           (float) or (list); ignored if optimizer = Momentum
        learning_rate:   (float) or (list);
        momentum:        (float) or (list);
        weight_decay:    (float) or (list);

    function_dict: {'function_name': {'function': 'function_class', 
                                      'params': {'parameter': value}, 
                                      'prob': probability_value}}

train:
    batch_size:          (int) number of examples in a batch to train the networks.
    eval_batch_size:     (int) batch size for evaluation; if = 0, the number of valid image is used
    max_epochs:          (int) maximum number of epochs to train the networks.
    epochs_to_eval:      (int) fitness is defined as the maximum accuracy in the last *epochs_to_eval*
    optimizer:           (str) RMSProp or Momentum

    # Dataset
    dataset:             (str) Cifar10 or CIFAR100
    data_augmentation:   (bool) True if data augmentation methods should be applied
    subtract_mean:       (bool) True if the dataset mean image should be subtracted from images

    # Tensorflow
    save_checkpoints_epochs: (int) number of epochs to save a new checkpoint.
    save_summary_epochs:     (float) number of epochs (or fraction of an epoch) to save new summary
    threads:                 (int) number of threads for Tensorflow ops (0 -> number of logical cores)

We provide 3 configuration file examples in the folder config_files; one can use them as-is, or modify as needed.
In summary, the files are:

config1.txt evolves both the architecture and some hyperparameters of the network
config2.txt evolves only the architecture and adopts penalization
config3.txt evolves only the architecture with residual blocks and adopts penalization

This is an example of how to run architecture search for dataset cifar10/cifar_tfr_10000 with config1.txt:

mpirun -n 20 python run_evolution.py \
    --experiment_path my_exp_config1 \
    --config_file config_files/config1.txt \
    --data_path cifar10/cifar_tfr_10000 \
    --log_level INFO

The number of workers in the MPI execution must be equal to the number of classical individuals. In config1.txt,
this number is 20 (num_quantum_ind (=5) x repetition (=4) = 20). The output folder my_exp_config1
looks like this:

12_7
csv_data
data_QNAS.pkl
log_params_evolution.txt
log_QNAS.txt

The folder 12_7 has the Tensorflow files for the best network in the evolution; in this case, is the individual number 7 found in generation 12. The folder csv_data has csv files with training information of the individuals (loss and accuracy for the best individuals in some generations). Both of these directories are not used in later steps, they are just information that one might want to inspect.

The file data_QNAS.pkl keeps all the evolution data (chromosomes, fitness values, number of evaluations, best individual ID ...). All the parameters (configuration file and command-line) are saved in log_params_evolution.txt, and log_QNAS.txt logs the evolution progression.

It is also possible to continue a finished evolution process. Note that all the parameters will be set as in
log_params_evolution.txt, ignoring the values in the file indicated by --config_file. The only parameter that can be overwritten is max_generations, so that one can set for how many generations the evolution will continue. To continue the above experiment for another 100 generations, the user can run:

mpirun -n 20 python run_evolution.py \
    --experiment_path my_exp_config1/continue \
    --config_file config_files/config1.txt \
    --data_path cifar10/cifar_tfr_10000 \
    --log_level INFO \
    --continue_path my_exp_config1

Run python run_evolution.py --help for additional parameter details.

3. Retrain network

After the evolution is complete, the final network can be retrained on the entire dataset (see papers for details). Here's an example of how to retrain the best network of the experiment saved in my_exp_config1 for 300 epochs with the dataset in cifar10/cifar_tfr, using the scheme (optimizer and hyperparameters) of the evolution:

python run_retrain.py \
    --experiment_path my_exp_config1 \
    --data_path cifar10/cifar_tfr \
    --log_level INFO \
    --max_epochs 300 \
    --batch_size 256 \
    --eval_batch_size 1000 \
    --threads 8 \
    --run_train_eval

After the training is complete, the directory my_exp_config1/retrain will contain the following files:

best
eval
eval_test
eval_train_eval
checkpoint
events.out.tfevents
graph.pbtxt
log_params_retrain.txt
model.ckpt-52500.data-00000-of-00001
model.ckpt-52500.index
model.ckpt-52500.meta

In the folder best, we have the best validation model saved. The file log_params_retrain.txt summarizes the training parameters. The other files and folders are generated by Tensorflow, including the last model saved, the graph and events for Tensorboard.

It is also possible to retrain the network with training schemes defined in the literature (check the help message for the --lr_schedule parameter). For example, to retrain the best network of experiment my_exp_config2 using the cosine scheme, one can run:

python run_retrain.py \
    --experiment_path my_exp_config2 \
    --data_path cifar10/cifar_tfr \
    --log_level INFO \
    --batch_size 256 \
    --eval_batch_size 1000 \
    --retrain_folder train_cosine \
    --threads 8 \
    --lr_schedule cosine \
    --run_train_eval

The script run_retrain.py also supports retraining any individual saved in data_QNAS.pkl: use the parameters --generation and --individual to indicate which one you want to train. Run python run_retrain.py --help for additional parameter details.

Profile architecture

If one wants to get the number of weights and the MFLOPs in a specific individual he/she can run run_profiling.py. For example, to get the values for individual 1 of generation 50 of the experiment saved in my_exp_config3, run:

python run_profiling.py \
    --exp_path my_exp_config3 \
    --generation 50 \
    --individual 1

Acknowledgements

This code was developed by Daniela Szwarcman while she was a PhD candidate in Electrical Engineering at PUC-Rio¹ and a PhD intern at IBM Research.

1. Advisor: Prof. Marley Vellasco (PUC-Rio)
Co-advisor: Daniel Civitarese (IBM Research)

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
cnn		cnn
config_files		config_files
tf_record		tf_record
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
adam2sgd.py		adam2sgd.py
chromosome.py		chromosome.py
clr_callback.py		clr_callback.py
convert_dcm.py		convert_dcm.py
evaluation.py		evaluation.py
population.py		population.py
qnas.py		qnas.py
qnas_config.py		qnas_config.py
run_dataset_prep.py		run_dataset_prep.py
run_evolution.py		run_evolution.py
run_profiling.py		run_profiling.py
run_retrain.py		run_retrain.py
util.py		util.py

License

julianoce/qnas

Folders and files

Latest commit

History

Repository files navigation

Q-NAS

Neural Architecture Search using the Q-NAS algorithm and Tensorflow.

Requirements

Running Q-NAS

1. Dataset Preparation

2. Run architecture search

3. Retrain network

Profile architecture

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Languages