Modeling protein-RNA interactions with GCNN

Abstract: We developed a procedure for predicting protein-RNA interactions on a protein using convolutional neural networks over graphs. We obtained the data from the PDB database, preprocessed it into a graph structure, and added appropriate features to each atom. Thus, the data are suitable for graph neural network models. We analyzed the models and presented the results with different performance metrics. Then we selected the best model, which achieved good results (ROC AUC = 0.9). We also implemented a graphical interface to visualize the structure of proteins in 3D space along with the predictions of the selected model.

Installation

Clone the code

git clone https://github.com/AGregorc/protein-RNA-iteractions.git

Next there is an option to use anaconda or pip to install all the requirements. The preferred one is using anaconda since it provides cudatoolkit.

Using Anaconda

Check anaconda installation guides. Here is just one example how to install miniconda on linux

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
sha256sum Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
rm Miniconda3-latest-Linux-x86_64.sh
source miniconda3/bin/activate

conda create -n torch python=3.7
conda activate torch

Move to the protein-RNA-iteractions directory eg. cd protein-RNA-iteractions and then

conda install --file requirements.txt -c pytorch -c dglteam -c conda-forge

Usage

Training models

To train models we need to load pdb and dssp data and preprocess it. To load and preprocess data from data/pdbs.lst run python src/main.py create. For training models run python src/main.py train. The default used model is two_branches_small but it can be manually changed it via model_names variable inside main function of src/main.py script.

The src/main.py script can also do bunch of other things eg. tune weight parameter for nn.CrossEntropyLoss, visualize models and metrics and compute feature importance. All this can be done with python src/main.py and one of this additional arguments: tune, visualize_models, visualize_metrics or feature_importance.

Plot with matplotlib

To plot one pdb crystal using matplotlib you can see src/data/PlotMPL.py script. Here we provide 2 examples how to call functions plot_from_file and plot_predicted:

plot_from_file('1a1t.pdb', lambda atom: None, load_feat_word_to_ixs(GENERAL_WORD_TO_IDX_PATH))
plot_predicted('1a1t.pdb', model, word_to_ixs)

Runing processes and user interface

The scheme above shows our default processes structure. All processes are packaged into separate docker images.

Run with docker compose

First set up environment variables:

export ADMIN_PASS=<your-password>

Please check docker-compose.yml file to see if all configuration suits your needs (eg. check VUE_APP_API_URL variable) and in src/update_dataset_and_preprocess.py file look at UPDATE_MODEL variable - Data Updater and Model Updater can be run from the same docker container.

After you are all set, run all processes using:

docker-compose up

Name		Name	Last commit message	Last commit date
Latest commit History 169 Commits
.github/workflows		.github/workflows
data		data
src		src
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements-cpu.txt		requirements-cpu.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github/workflows

.github/workflows

data

data

src

src

.gitignore

.gitignore

README.md

README.md

docker-compose.yml

docker-compose.yml

requirements-cpu.txt

requirements-cpu.txt

requirements.txt

requirements.txt

Repository files navigation

Modeling protein-RNA interactions with GCNN

Installation

Using Anaconda

Usage

Training models

Plot with matplotlib

Runing processes and user interface

Run with docker compose

About

Releases

Packages

Contributors 2

Languages

AGregorc/protein-RNA-iteractions

Folders and files

Latest commit

History

Repository files navigation

Modeling protein-RNA interactions with GCNN

Installation

Using Anaconda

Usage

Training models

Plot with matplotlib

Runing processes and user interface

Run with docker compose

About

Resources

Stars

Watchers

Forks

Languages