This repository is part of team Heartbreak's project for the course Data and Visual Analytics in Spring 2019 at Georgia Institute of Technology
The purpose of this project is to identify protein-protein interactions in biomedical literature
Item | Purpose |
---|---|
app.py | The backend launch file |
papers | Sample PDF papers |
ml | Contains data processing and classifier training code |
api | API served by the backend |
genes | Dataset of gene names |
datasets | Training, testing and original datasets |
web-ui | React front-end |
The project has two parts
- Backend
- Frontend
The backend is responsible for providing an REST API that takes as input a paper in pdf or txt
format and outputs a JSON that describes the genes and their interactions.
When the backend starts, it trains the classifier using the transformed data in the
datasets/training
directory
The frontend is the user-interface made in React. It uses Material-UI to render a modern looking frontend for the project. For visualizing the graph D3.js is used.
There are two ways to run this project
- Using Docker (the easy way)
- Using Python and NodeJs as usual
- Windows 10 Pro, Linux or MacOS
- Docker
- Open terminal in the project root folder
- Run
docker-compose up -d
- Python 2.7
- NodeJs (Latest LTS version)
- Java 8 Runtime (JRE)
- Go to the repository directory
- Install python dependencies.
pip install requirements.txt
- Download NLTK packages.
python -m nltk.downloader punkt averaged_perceptron_tagger universal_tagset
- Run
python app.py
- Go the directory
<repository-root>/web-ui
- Install npm dependencies.
npm install
- Run
npm start
- If running through Docker. Navigate to
localhost:4000
elselocalhost:3000
- Click the button "Choose File". Only pdf and txt file formats are supported.
- Select the file
PIIS1097276506003376.pdf
in thepapers
folder in the repository root - Click "ANALYZE FILE"
- Wait for the graph
- Drag and drop the nodes to move the nodes
- Custom made dataset of gene interaction sentences
- BioC-BioGRID
- reviewed-home-sapien-genes.tab from UniProt