GitHub - tjustorm/uncertainty: Learning with uncertainty for biological discovery and design

Learning with Uncertainty for Biological Discovery and Design

This repository contains the analysis source code used in the paper.

Data

You can download the relevant datasets using the commands

wget http://cb.csail.mit.edu/cb/uncertainty-ml-mtb/data.tar.gz
tar xvf data.tar.gz

within the same directory as this repository.

Dependencies

The major Python package requirements and their tested versions are in requirements.txt. These are the requirements for most of the experiments below, including for the GP-based models. These experiments were run with Python version 3.7.4 on Ubuntu 18.04.

For the Bayesian neural network experiemnts, we used the edward package (version 1.3.5) alongside tensorflow on a CPU (version 1.5.1) in a separate conda environment. These experiments used Python 3.6.10.

We also used the RDKit (version 2017.09.1) within its own separate conda environment with Python 3.6.10; download instructions can be found here.

Compound-kinase affinity prediction experiments

Cross-validation experiments

The command for running the cross-validation experiments is

# Average case metrics.
bash bin/cv.sh
# Lead prioritization (all).
bash bin/exploit.sh
# Lead prioritization (separated by quadrant).
bash bin/quad.sh

which will launch the CV experiments for various models at different seeds implemented in bin/train_davis2011kinase.py.

Discovery experiments for validation

The command for running the prediction-based discovery experiments (to identify new candidate inhibitors in the ZINC/Cayman dataset) is

python bin/predict_davis2011kinase.py MODEL exploit N_CANDIDATES [TARGET] \
    > predict.log 2>&1

which will launch a prediction experiment for the MODEL (one of gp, sparsehybrid, or mlper1 for the GP, MLP + GP, or MLP, respectively) to acquire N_CANDIDATES number of compounds. The TARGET argument is optional, but will restrict acquisition to a single protein target. For example, to acquire the top 100 compounds for PknB, the command is:

python bin/predict_davis2011kinase.py gp exploit 100 pknb > \
    gp_exploit100_pknb.log 2>&1

To incorporate a second round of prediction, you can also specify an additional text file argument at the command line, e.g.,

python bin/predict_davis2011kinase.py gp exploit 100 pknb data/prediction_results.txt \
    > gp_exploit100_pknb_round2.log 2>&1

Docking experiments

Docking experiments to validate generative designs selected by a GP, MLP + GP, and MLP can be launched by

bash bin/dock.sh

using the structure in data/docking/.

Protein fitness experiments

Experiments testing out-of-distribution prediction of avGFP fluorescence can be launched by

bash bin/gfp.sh

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
bin		bin
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bin

bin

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

Learning with Uncertainty for Biological Discovery and Design

Data

Dependencies

Compound-kinase affinity prediction experiments

Cross-validation experiments

Discovery experiments for validation

Docking experiments

Protein fitness experiments

About

Releases

Packages

Languages

License

tjustorm/uncertainty

Folders and files

Latest commit

History

Repository files navigation

Learning with Uncertainty for Biological Discovery and Design

Data

Dependencies

Compound-kinase affinity prediction experiments

Cross-validation experiments

Discovery experiments for validation

Docking experiments

Protein fitness experiments

About

Resources

License

Stars

Watchers

Forks

Languages