ChemProtBioCreativeVI

This repository contains the source code of the three-stage approach for the chemical-protein interaction extraction task in the BioCreative challenge VI. Details of the three-stage approach are described in: Natural language processing based feature engineering for extracting chemical-protein interactions from literature, (2018), Lung P-Y, He Z, Zhao T & Zhang J.

Prerequisites

Python 3.4+
Sklearn
XGBoost
NLTK
Stanford Neural Network Dependency Parser

Data

Partial dataset used in the model are located in the data folder for demonstration purpose. It contains abstracts of PubMed articles, tagged chemical/protein entities and labeled relations released by the task organizers. The complete dataset, as well as the gold standard for testing set, can be found at BiocreativeVI, or by contacting the organizers: Martin Krallinger & Jesús Santamaría.

Usage

In the last line of RunParser.py, specify the path to the Stanford Neural Network Dependency Parser. Next, run the command

$ sh demo.sh

This will run the pipeline, and generate ChemProtTest_sumbit.tsv, where each row contains: PubMedID, relation type, chemical entity, protein entity.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
data		data
src		src
CHEMPROT_guidelines_v6.pdf		CHEMPROT_guidelines_v6.pdf
README.md		README.md
demo.sh		demo.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

src

src

CHEMPROT_guidelines_v6.pdf

CHEMPROT_guidelines_v6.pdf

README.md

README.md

demo.sh

demo.sh

Repository files navigation

ChemProtBioCreativeVI

Prerequisites

Data

Usage

About

Releases

Packages

Languages

Beira-BF/ChemProtBioCreativeVI

Folders and files

Latest commit

History

Repository files navigation

ChemProtBioCreativeVI

Prerequisites

Data

Usage

About

Resources

Stars

Watchers

Forks

Languages