Skip to content

norvin2002/Deep-Learning-Spectroscopy

 
 

Repository files navigation

Usage

  • create a new folder, this represents a new experiment
  • write a script which does the following:
    • clones this repository
    • checks out a specific revision (This represents a revision of the code which is used to run the experiment, since its in the repo, the revision identifies the exact code used to run the experiment)
    • invokes the python script with the necessary arguments, this ensures the arguments used for the experiment are saved.

NOTE: At anytime, there would be only one experiment run inside a folder so the above steps are fine. If you need to execute a different revision of another script in this repo (e.g. vis.py) then you would need to clone the repo again and then revert back to the specific revision to use the script. This is clumsy, but untill a better way is figured out, this would be our approach. Code for deep learning models to predict molecular electronic properties. For the deep tensor neural network model, I built upon the code from Peter Bjørn Jørgensen (DTU) and would like to thank him for sharing his implementation.

Code corresponding to the paper : Deep Learning Spectroscopy: Neural Networks for Molecular Excitation Spectra (PDF)

Software versions used:

  • CUDA - 8.0.61 (did not use CuDNN)
  • Python - 2.7
  • Theano - 0.9.0
  • PyGPU - 0.6.4
  • Lasagne - 0.1
  • NumPy - 1.12.1
  • SciPy - 0.19.0

To run the DTNN code use the following hyper parameters:

THEANO_FLAGS=mode=FAST_RUN,device=cuda,floatX=float32,openmp=True python deep_tensor_refactored/train.py --learn_rate 0.00001 --clen 40 --batch_size 50 --num_neu_1 100 --num_neu_2 200 --model_name model_neu1_neu2_with_noise data.xyz spectra.npz

Running the CNN code is similar :

THEANO_FLAGS=mode=FAST_RUN,device=cuda,floatX=float32,openmp=True python /m/home/home0/00/ghoshk1/data/Desktop/Thesis/code/thesis_code/CNN_refactored/train.py --learn_rate 0.0001 --coulomb_dim 29 -c 22 -c 47 -c 42 --batch_size 90 --model_name orig_model coulomb.npz spectra.npz

The data that we trained on can be found here : https://zenodo.org/record/3386508

NOTE : After publishing the paper we noticed that about twelve molecules in the dataset had less than 16 energy levels. To be consistent with the text in the paper, please remove these molecules when running the experiments. We anticipate the results to not change significantly, since a very small portion (12 molecules out of 132k) of the dataset is affected. The following script was used to identify the molecules.

Index in the dataset Molecule IUPAC
0 CH4
1 C2H6
2 C3H4
3 C2OH4
4 C3H8
5 C2OH6
6 COCH6
7 C3H6
8 C2OH4
10 C4H2
11 C4H6
12 C4H6

Pre-trained models can be found here : https://zenodo.org/record/3386531

To predict on new Coulomb matrices (we had a bug and trained on -ve of coulomb matrices so remember to do that when you predict on new CMs) with pertained CNN :

THEANO_FLAGS=mode=FAST_RUN,device=cuda,floatX=float32,openmp=True python CNN_refactored/predict.py --coulomb_dim 29 -c 22 -c 47 -c 42 --batch_size 90 --model_name orig_model $data_path/cm_10k_neg.txt $saved_model_path/results/model_epoch880.pkl.gz $saved_model_path/Y_vals.npz

To predict on new XYZ files using pertained DTNN :

OMP_NUM_THREADS=8 THEANO_FLAGS=mode=FAST_RUN,device=cuda,floatX=float32,openmp=True python deep_tensor_refactored/predict.py --clen 40 --batch_size 50 --num_neu_1 100 --num_neu_2 200 --model_name model_neu1_neu2_with_noise $data_path/new.xyz $saved_model_path/results/model_epoch9998.pkl.gz $saved_model_path/Y_vals.npz

About

Code for deep learning models to predict molecular electronic properties.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%