This repository contains biomedical relation extraction systems capable of recognizing the statements of relations between chemical compounds/drugs and genes/proteins from biomedical literature. The code is developed for our participation in the BioCreative VI Task 5 (CHEMPROT) challenge.
Contact: Farrokh Mehryary, farmeh@utu.fi
The code contains three main parts:
- An SVM-based biomedical relation extraction system which relies on a rich set of features. The code/instructions for this system can be found here: https://github.com/jbjorne/TEES
- Two deep learning-based biomedical relation extraction system (I-ANN and ST-ANN).
- The script which combines the predictions of the SVM and I-ANN system: https://github.com/jbjorne/TEES/blob/development/Utils/Combine.py
You will need the following prerequisites to run the code in this git repository:
- Python 2.7
- Theano 0.9.0 library for python (See: http://deeplearning.net/software/theano/ )
- Keras 2.0.6 library for python (See: https://keras.io/ )
- NetworkX 1.11 library for python (See: https://networkx.github.io/documentation/networkx-1.10/overview.html )
- scikit-learn library for python (See: http://scikit-learn.org/stable/ )
- wvlib library for python (See: https://github.com/spyysalo/wvlib )
- Pre-trained Word2Vec Model (Download from: http://evexdb.org/pmresources/vec-space-models/PubMed-and-PMC-w2v.bin and See: http://bio.nlplab.org/)
The code can be fully executed on CPU. However, we recommend to run this code on GPU for faster execution. In this case, you will need CUDA libraries to be installed.
Please cite our paper if you use (parts of) our code:
Mehryary et al. (2018) Potent pairing: ensemble of long short-term memory networks and support vector machine for chemical-protein relation extraction. Database.