Usage

create a new folder, this represents a new experiment
write a script which does the following:
- clones this repository
- checks out a specific revision (This represents a revision of the code which is used to run the experiment, since its in the repo, the revision identifies the exact code used to run the experiment)
- invokes the python script with the necessary arguments, this ensures the arguments used for the experiment are saved.

NOTE: At anytime, there would be only one experiment run inside a folder so the above steps are fine. If you need to execute a different revision of another script in this repo (e.g. vis.py) then you would need to clone the repo again and then revert back to the specific revision to use the script. This is clumsy, but untill a better way is figured out, this would be our approach. Code for deep learning models to predict molecular electronic properties. For the deep tensor neural network model, I built upon the code from Peter Bjørn Jørgensen (DTU) and would like to thank him for sharing his implementation.

Code corresponding to the paper : Deep Learning Spectroscopy: Neural Networks for Molecular Excitation Spectra (PDF)

Software versions used:

CUDA - 8.0.61 (did not use CuDNN)
Python - 2.7
Theano - 0.9.0
PyGPU - 0.6.4
Lasagne - 0.1
NumPy - 1.12.1
SciPy - 0.19.0

To run the DTNN code use the following hyper parameters:

THEANO_FLAGS=mode=FAST_RUN,device=cuda,floatX=float32,openmp=True python deep_tensor_refactored/train.py --learn_rate 0.00001 --clen 40 --batch_size 50 --num_neu_1 100 --num_neu_2 200 --model_name model_neu1_neu2_with_noise data.xyz spectra.npz

Running the CNN code is similar :

THEANO_FLAGS=mode=FAST_RUN,device=cuda,floatX=float32,openmp=True python /m/home/home0/00/ghoshk1/data/Desktop/Thesis/code/thesis_code/CNN_refactored/train.py --learn_rate 0.0001 --coulomb_dim 29 -c 22 -c 47 -c 42 --batch_size 90 --model_name orig_model coulomb.npz spectra.npz

The data that we trained on can be found here : https://zenodo.org/record/3386508

❗ NOTE : After publishing the paper we noticed that about twelve molecules in the dataset had less than 16 energy levels. To be consistent with the text in the paper, please remove these molecules when running the experiments. We anticipate the results to not change significantly, since a very small portion (12 molecules out of 132k) of the dataset is affected. The following script was used to identify the molecules.

Index in the dataset	Molecule IUPAC
0	CH4
1	C2H6
2	C3H4
3	C2OH4
4	C3H8
5	C2OH6
6	COCH6
7	C3H6
8	C2OH4
10	C4H2
11	C4H6
12	C4H6

Pre-trained models can be found here : https://zenodo.org/record/3386531

To predict on new Coulomb matrices (we had a bug and trained on -ve of coulomb matrices so remember to do that when you predict on new CMs) with pertained CNN :

THEANO_FLAGS=mode=FAST_RUN,device=cuda,floatX=float32,openmp=True python CNN_refactored/predict.py --coulomb_dim 29 -c 22 -c 47 -c 42 --batch_size 90 --model_name orig_model $data_path/cm_10k_neg.txt $saved_model_path/results/model_epoch880.pkl.gz $saved_model_path/Y_vals.npz

To predict on new XYZ files using pertained DTNN :

OMP_NUM_THREADS=8 THEANO_FLAGS=mode=FAST_RUN,device=cuda,floatX=float32,openmp=True python deep_tensor_refactored/predict.py --clen 40 --batch_size 50 --num_neu_1 100 --num_neu_2 200 --model_name model_neu1_neu2_with_noise $data_path/new.xyz $saved_model_path/results/model_epoch9998.pkl.gz $saved_model_path/Y_vals.npz

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
CNN_MBTR_refactored		CNN_MBTR_refactored
CNN_refactored		CNN_refactored
MLP_refactored		MLP_refactored
deep_tensor		deep_tensor
deep_tensor_BAK		deep_tensor_BAK
deep_tensor_refactored		deep_tensor_refactored
utils		utils
.gitignore		.gitignore
7K_20_eigs_baseline1.py		7K_20_eigs_baseline1.py
7K_20_eigs_binary.py		7K_20_eigs_binary.py
7K_20_eigs_binaryANDrealpredict.py		7K_20_eigs_binaryANDrealpredict.py
7K_20_eigs_binaryTHEANO.py		7K_20_eigs_binaryTHEANO.py
7K_20_eigs_individualPredictTHEANO.py		7K_20_eigs_individualPredictTHEANO.py
7K_20_eigs_individualPredict_SN_shuffleTHEANO.py		7K_20_eigs_individualPredict_SN_shuffleTHEANO.py
7K_20_eigs_individualPredict_shuffleTHEANO.py		7K_20_eigs_individualPredict_shuffleTHEANO.py
7K_predict_HOMO.py		7K_predict_HOMO.py
7K_spectrum_binaryTHEANO.py		7K_spectrum_binaryTHEANO.py
7K_spectrum_binaryTHEANO_Y_nopreprocess.py		7K_spectrum_binaryTHEANO_Y_nopreprocess.py
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
coulomb_shuffle.py		coulomb_shuffle.py
coulomb_shuffle_layer.py		coulomb_shuffle_layer.py
get_idxs_to_keep.py		get_idxs_to_keep.py
nn.py		nn.py
plot_learn_curves.py		plot_learn_curves.py
plot_totele_heavy_vs_error.py		plot_totele_heavy_vs_error.py
util.py		util.py
vis.py		vis.py

License

norvin2002/Deep-Learning-Spectroscopy

Folders and files

Latest commit

History

Repository files navigation

Usage

Software versions used:

About

Resources

License

Stars

Watchers

Forks

Languages