This repository contains the analysis source code used in the paper.
You can download the relevant datasets using the commands
wget http://cb.csail.mit.edu/cb/uncertainty-ml-mtb/data.tar.gz
tar xvf data.tar.gz
within the same directory as this repository.
The major Python package requirements and their tested versions are in requirements.txt. These are the requirements for most of the experiments below, including for the GP-based models. These experiments were run with Python version 3.7.4 on Ubuntu 18.04.
For the Bayesian neural network experiemnts, we used the edward
package (version 1.3.5) alongside tensorflow
on a CPU (version 1.5.1) in a separate conda environment. These experiments used Python 3.6.10.
We also used the RDKit (version 2017.09.1) within its own separate conda environment with Python 3.6.10; download instructions can be found here.
The command for running the cross-validation experiments is
# Average case metrics.
bash bin/cv.sh
# Lead prioritization (all).
bash bin/exploit.sh
# Lead prioritization (separated by quadrant).
bash bin/quad.sh
which will launch the CV experiments for various models at different seeds implemented in bin/train_davis2011kinase.py
.
The command for running the prediction-based discovery experiments (to identify new candidate inhibitors in the ZINC/Cayman dataset) is
python bin/predict_davis2011kinase.py MODEL exploit N_CANDIDATES [TARGET] \
> predict.log 2>&1
which will launch a prediction experiment for the MODEL
(one of gp
, sparsehybrid
, or mlper1
for the GP, MLP + GP, or MLP, respectively) to acquire N_CANDIDATES
number of compounds. The TARGET
argument is optional, but will restrict acquisition to a single protein target. For example, to acquire the top 100 compounds for PknB, the command is:
python bin/predict_davis2011kinase.py gp exploit 100 pknb > \
gp_exploit100_pknb.log 2>&1
To incorporate a second round of prediction, you can also specify an additional text file argument at the command line, e.g.,
python bin/predict_davis2011kinase.py gp exploit 100 pknb data/prediction_results.txt \
> gp_exploit100_pknb_round2.log 2>&1
Docking experiments to validate generative designs selected by a GP, MLP + GP, and MLP can be launched by
bash bin/dock.sh
using the structure in data/docking/
.
Experiments testing out-of-distribution prediction of avGFP fluorescence can be launched by
bash bin/gfp.sh