This software package implements our developed framework AGD for materials design based on active learning. This is the official Python repository.
Machine Learning and Evolution Laboratory
Department of Computer Science and Engineering
University of South Carolina
How to cite:
The package provides 3 major functions:
- Perform active-learning based sampling in whole design latent space (based on Bayesian Optimization).
- Train and evaluate the performance of a screening model (based on Roost).
- Generate material cadidates' cif files based on element substitution (based on ELMD).
The following paper describes the details of the our framework: Active learning based generative design for discovery of wide band gap materials
Install any of the relevant packages if not already installed:
- Bayesian Optimization (tested on 1.2.0)
- tensorflow (tested on 2.2.0)
- GATGNN documentation.
- RooSt documentation.
- Numpy (tested on 1.18.5)
- Pandas (tested on 1.1.0)
- Scikit-learn (tested on 0.21.3)
- Pytmatgen (tested on 2020.3.13)
Bayesian Optimization, Pytorch, Numpy, Pandas, Scikit-learn, and Pymatgen
conda install -c conda-forge bayesian-optimization
pip install numpy
pip install pandas
pip install scikit-learn
pip install pymatgen
- Download the compressed file of our dataset using this link
- Unzip its content ( two .csv files' and 5 pre-trained models)
- Move the csv files in your AML_Roost directory. i.e. such that the datapath now exists.
Once all the aforementionned requirements are satisfied, one can easily generate target property material candidates by running ALSearch.py in the terminal along with the specification of the appropriate flags. At the bare minimum, using --budget to specify the active learning budget and --kappa to control balance between exploration and exploitation.
- Example. start active-learning process given budget and kappa.
python ALSearch.py --budget 50 --kappa 100 --candidate_out_path path/you/prefer
The generated materials and their predicted property will be automatically generated under specified folder
Upon acquire active-learning augumented data, one can train and evaluate a screening model's performance using Roost package and GAN generated dataset. The 5 augumented dataset corresponding to Exp1, Exp2_BS, Exp2_AL, Exp3_BS, Exp3_AL in the paper are in /root_path/roost/roost/examples/prepared_training_data/
The 5 pre-trained models in figshare link are corresponding to Exp1, Exp2_BS, Exp2_AL, Exp3_BS, Exp3_AL.
Under roost/roost/examples, you can train and evluate model performance using hold out dataset:
python roost-predict.py --data-path /root_path/roost/roost/examples/prepared_training_data/Exp3_AL_1153.csv --train --evaluate --val-size 0.2 --epochs 200 --run-id 311
Independent test dataset is under folder roost/roost/examples/prepared_training_data/ Under roost/roost/examples
python roost-predict.py --test-path /root_path/roost/roost/examples/prepared_training_data/bd_test_only.csv --regression --evaluate --run-id 3
To test the recovery rate:
python screen_recover_rate.py