GitHub - rothadamg/UPSITE: PPI BioNLP classification system

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Classifiers		Classifiers
Core		Core
Detectors		Detectors
Evaluators		Evaluators
ExampleBuilders		ExampleBuilders
ExampleWriters		ExampleWriters
Tools		Tools
Utils		Utils
test		test
text_files		text_files
.gitignore		.gitignore
.project		.project
.pydevproject		.pydevproject
Cosine_Sim.py		Cosine_Sim.py
LICENSE		LICENSE
LSA_2.py		LSA_2.py
MANIFEST.in		MANIFEST.in
README		README
README~		README~
UPSITE.py		UPSITE.py
__init__.py		__init__.py
batch.py		batch.py
classify.py		classify.py
configure.py		configure.py
pmids.py		pmids.py
queries.py		queries.py
setup.py		setup.py
train.py		train.py
visualize.py		visualize.py

Repository files navigation

UPSITE
===============================

UPSITE is a large scale bioNLP classification system created by researchers from the University of Pittsburgh and Carnegie Mellon University. At the time of writing this documentation, the corresponding publication had been accepted to GLBio2015 conference and IEEE Transactions on Computational Biology and bioinformatics under the title “Text Mining for Validating Protein Interactions”. An exhaustive description of the system can be found within that paper. UPSITE has gained many functions over the course of its development and therefore can be used for a wide variety of ML related entity classification problems. The default and primary use of UPSITE is its ability to automatically synthesize and collate large portions of the ~24million document PubMed corpus and automatically classify entity-entity interactions (primarily PPIs). 
 
UPSITE was designed using only the highest performing modules available to the BioNLP community in an effort to minimize pipeline error propagation. This is great for optimizing performance, but makes its installation quite difficult. For this reason, I highly recommend accessing the public ally available Amazon Machine Instance (AMI) at the following URL: https://console.aws.amazon.com/ec2/v2/home?region=us-east-1#Images:visibility=public-images;search=UPSITE;sort=name 
The AMI contains a fully pre-configured Ubuntu 14.04 environment and running version of UPSITE. 

To install UPSITE, clone the repository to your local system and setup TEES by running the configure.py file. Refer to the TEES documentation for additional information- https://github.com/jbjorne/TEES/wiki

Once you have installed TEES and any necessary dependencies, You may run the entire UPSITE algorithm through the UPSITE.py file. Run UPSITE using default arguments by navigating to its containing folder and typing "python UPSITE.py" into your terminal. Please refer to OptionParser help for the available command line args and explainations.

UPSITE was written by Adam G. Roth. The author can be contacted for questions or collorborations at Roth.AdamG@gmail.com

Turku Event Extraction System 2.2
=================================

Turku Event Extraction System (TEES) is a free and open source natural language
processing system developed for the extraction of events and relations from 
biomedical text. It is written mostly in Python, and should work in generic 
Unix/Linux environments.

TEES has been evaluated in the following Shared Tasks and models for predicting
their targets are included in this release.

 * BioNLP 2009 Shared Task (1st place)
 * BioNLP 2011 Shared Task (1st place in 4/8 tasks, only system to participate
    in all tasks)
 * DDI 2011 (Drug-drug interactions) Challenge (4th place, at 96% of the 
    performance of the best system)

For more information and documentation, see the TEES wiki at 
https://github.com/jbjorne/TEES/wiki


Quick Start
===========

To get started with TEES, download the latest stable release from
http://sourceforge.net/projects/tees/files or the current 
version from the repository. After downloading, TEES can optionally be installed 
as a module using "setup.py", but this is not required, and the program can 
simply be used from the unpacked archive.

However, before using TEES the external programs and datafiles need to be 
installed using the interactive configuration tool "configure.py", located
in the package root directory:

python configure.py

After TEES had been configured, you can predict events or relations for text 
with classify.py. Using the "-m" (model) switch, you can select one of the 
pre-computed models (listed at https://github.com/jbjorne/TEES/wiki/Classifying). 
For example, to run TEES prediction for the BioNLP 2011 GENIA development 
corpus, use:

python classify.py -m GE11-devel -i GE11-devel -o OUTSTEM
 
where "OUTSTEM" is the output file stem. To try TEES on unannotated text, you 
can give "classify.py" a PubMed citation id, such as:

python classify.py -m GE11 -i 9668063 -o OUTSTEM
  
TEES will download the abstract and use the
integrated preprocessing pipeline to split the text into sentences (with the
GENIA Sentence Splitter, http://www.nactem.ac.uk/y-matsu/geniass/), detect
named entities (with BANNER, http://banner.sourceforge.net/) and parse the
text (with BLLIP Parser using David McClosky's biomodel, 
http://bllip.cs.brown.edu/resources.shtml and Stanford format
conversion, http://nlp.stanford.edu/software/lex-parser.shtml), after which
events are detected from the document.

Using TEES
==========

The primary user interface to TEES consists of the following programs

 * classify.py - Predict events/relations with an existing model
 * train.py - Train a new event/relation extraction model
 * batch.py - Batch process large sets of input files
 * configure.py - Install TEES models, external tools and corpora
 * visualize.py - Visualize the events and parse for a sentence
 
For information on using these programs, see the TEES wiki at 
https://github.com/jbjorne/TEES/wiki

TEES also has a number of modules that can be used as standalone executables,
including the wrappers for external tools such as parsers. A list of these
executables can be found at https://github.com/jbjorne/TEES/wiki/Programs