Skip to content

tsudalab/ChemGE

Repository files navigation

ChemGE: Population-based De Novo Molecule Generation, Using Grammatical Evolution

Molecule design using grammatical evolution. The paper is available at https://arxiv.org/abs/1804.02134

Update: ChemGE paper is published in Chemistry Letters. https://doi.org/10.1246/cl.180665

The advantages of ChemGE are:

  • Faster SMILES generation
  • Inherent paralleism
  • Novelty and diversity in designed molecules

In this repository, we provide the code used in our experiment.

  1. Benchmark on druglikeness score (optimizeJ.py)
  2. Design of high-scoring molecules for thymidine kinase (oprimize-rdock.py)
  3. Scalability of ChemGE on parallel environment (optimize-rdock-qsub.py)

Requirements

  1. Python
  2. RDKit
  3. rDock

How to set up on Ubuntu 16.04

Compile of rDock may fail in new compilers. I recommend to use Ubuntu 16.04.

Install Python libraries

git clone https://github.com/pyenv/pyenv.git $HOME/.pyenv
echo 'export PYENV_ROOT="$HOME/.pyenv"' >> $HOME/.bashrc
echo 'export PATH="$PYENV_ROOT/bin:$PATH"' >> $HOME/.bashrc
echo -e 'if command -v pyenv 1>/dev/null 2>&1; then\n  eval "$(pyenv init -)"\nfi' >> $HOME/.bashrc
source $HOME/.bashrc
pyenv install anaconda3-5.0.1
pyenv global anaconda3-5.0.1
git clone https://github.com/pyenv/pyenv-virtualenv.git $(pyenv root)/plugins/pyenv-virtualenv
exec "$SHELL"
conda create -c rdkit -n my-rdkit-env rdkit
source activate my-rdkit-env
pip install nltk networkx

Install rDock

sudo apt-get update
sudo apt-get install build-essential libcppunit-dev libpopt-dev
cd ~
wget https://sourceforge.net/projects/rdock/files/rDock_2013.1_src.tar.gz 
tar xf rDock_2013.1_src.tar.gz
cd rDock_2013.1_src/build/
make linux-g++-64
echo 'export RBT_ROOT="$HOME/rDock_2013.1_src"' >> $HOME/.bashrc
echo 'export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:$RBT_ROOT/lib"' >> $HOME/.bashrc
echo 'export PATH="$PATH:$RBT_ROOT/bin"' >> $HOME/.bashrc
source $HOME/.bashrc

Download ChemGE

git clone https://github.com/tsudalab/ChemGE.git
cd ChemGE

Usage

Please execute in my-rdkit-env environment (please execute source activate my-rdkit-env)

Optimize J

python -u optimize-J.py > log-file &

Optimize docking score (multi-thread)

python -u optimize-rdock.py > log-file &

If all score is 10000000000.0, installation of rDock may have failed. Please check installation directory.

Optimize docking score (qsub)

You need to set up parallel environment with qsub to execute this program. Using CfnCluster is recommended.

python -u optimize-rdock-qsub.py > log-file &

Explanation of files

  • results/: log files of our experiment
  • cavity.as, cavity.prm, receptor.mol2: Information of thymidine kinase, which is required to run rDock. Detailed explanation is below.
  • 250k_rndm_zinc_drugs_clean.smi: ZINC dataset, which is from mkusner/grammarVAE
  • fpscores.pkl.gz: Used to calculate J score, which is from mkusner/grammarVAE
  • Python files: Used in experiment. A part of zinc_grammar.py and cfg_util.py is from mkusner/grammarVAE.

How to generate files required to execute rDock

cavity.prm in this repository is for docking simulation with KITH. If you want to run ChemGE for different protein, you need to generate files on your own.

Assume that rDock is installed following above step.

  1. Download receptor.pdb and crystal_ligand.mol2 from DUD-E (e.g. KITH (DUD-E))

  2. Execute following commands (Open Babel is required to run these scripts)

$ ChemGE/util/pdb2mol ./receptor.pdb  # pdb -> mol2
$ ChemGE/util/mol2sd crystal_ligand.mol2 # mol2 -> sd(f)
$ ChemGE/util/gen_prm crystal_ligand.sd receptor.mol2 > cavity.prm
$ $RBT_ROOT/build/exe/rbcavity -r cavity.prm -W

License

This project is licensed under the terms of the MIT license.

About

Population-based De Novo Molecule Generation, Using Grammatical Evolution

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published