relna

Relna is a Text Mining (TM) tool for relation extraction for transcription factors and gene / gene products. To the best of our knowledge, it is the first text mining tool for relation extraction of transcriptor factors and associated proteins. It is part of a thesis at Technical University, Munich. This tool is built on the nalaf framework, developed as part of two other theses done at Technical University, Munich. The tool is generic enough that it can be extended by people with their own modules, eg. parsers, features, taggers etc. The method uses Support Vector Machines, and allows for the use of Tree Kernels.

nalaf framework is well documented here.

As part of the thesis, an associated corpus by the same name (relna) was annotated using tagtog. The relna corpus consists of 140 documents that have been semi-automatically annotated using GNormPlus for named entities and manually annotated for relations. The reason for relation extraction for transcription factors and gene / gene products, and corpus statistics is documented here.

Using our method, we achieve an F-measure of 69.3% on the relna corpus. The full results of our experiments are available here.

The pipeline used by relna is as follows:

Install

Requirements

Python 3
SVMLight, linear vs tree kernel:
- The default is to use SVMLight with linear kernels, already defined in https://github.com/Rostlab/nalaf.
- If using SVMLight TK for tree kernels:
  - BLLIP Parser
  - SVMLight-TK-1.2
    - The easiest way to install it is to download compiled binaries from the official website.
    - You will have to fill up a form to get this, and make the build using the given Makefile.
    - Place the binaries svm_classify and svm_learn in your $PATH (note, that as of now, this is also needed in nalaf for SVMLight)

Install Code

Installation of nalaf

git clone https://github.com/Rostlab/nalaf
cd nalaf
python3 setup.py install
python3 -m nalaf.download_corpora

Installation of relna

git clone https://github.com/Rostlab/relna.git
cd relna
python3 setup.py install
python3 -m relna.download_corpora

Eventually, when the package is registered on PyPi, you can simply install relna by:

pip3 install relna

Examples

Run:

relna.py for a simple example how to use relna just for prediction with a pre-trained model
- python3 relna.py -c [PATH SVMLight BIN DIR] -p 10383460
- python3 relna.py -c [PATH SVMLight BIN DIR] -s "Conclusion: we find that Ubc9 interacts with the androgen receptor (AR), a member of the steroid receptor family of ligand-activated transcription factors. In transiently transfected COS-1 cells, AR-dependent but not basal transcription is enhanced by the coexpression of Ubc9."
- python3 relna.py -c [PATH SVMLight BIN DIR] -d example.txt

Future Work

Important:

Implement neural networks (Theano or TensorFlow, when they release for Python 3) for training and classifying data and evaluate performance on that.
Implement bootstrapping for relation extraction (similar to nalaf, where it has been done for entities)
Implement multiple sentence models, looking at relations at a distance of one sentence and beyond

Not-So-Important:

Implement corereference resolution (might increase performance slightly)
Experiment with Tree Kernels (SVMLight TK), which achieves a very high precision P>91, to extract highly-accturate relationships from entire PubMed. That, in the end, may give better task extraction results since the lower recall (R~21) is compensated by the size of the large corpus of PubMed.
SpaCy plans to implement its own constituent parser, replace BLLIP with SpaCy for speed and efficiency (no linking to external C/C++ libraries)

Name		Name	Last commit message	Last commit date
Latest commit History 217 Commits
relna		relna
resources/corpora		resources/corpora
results/images		results/images
tests		tests
wiki		wiki
.gitignore		.gitignore
.travis.yml		.travis.yml
MANIFEST.in		MANIFEST.in
README.md		README.md
example.txt		example.txt
relna.py		relna.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

relna

resources/corpora

resources/corpora

results/images

results/images

tests

tests

wiki

wiki

.gitignore

.gitignore

.travis.yml

.travis.yml

MANIFEST.in

MANIFEST.in

README.md

README.md

example.txt

example.txt

relna.py

relna.py

setup.py

setup.py

Repository files navigation

relna - Biomedical Text Mining for Relation Extraction

Install

Requirements

Install Code

Examples

Future Work

Important:

Not-So-Important:

About

Releases

Packages

Languages

wy692/relna

Folders and files

Latest commit

History

Repository files navigation

relna - Biomedical Text Mining for Relation Extraction

Install

Requirements

Install Code

Examples

Future Work

Important:

Not-So-Important:

About

Resources

Stars

Watchers

Forks

Languages