IHP

Framework for identifying Human Phenotype entities

Dependencies and other uses should follow the original ReadMe.

This is a fork created to accomodate an annotator for the Human Phenotype Ontology. It uses Gold Standard Corpora and Test Suites Created by Bio-Lark. Link Here

Usage

If a corpus is to be loaded into IHP, it's necessary to run Stanford CoreNLP.

cd bin/stanford-corenlp-full-2015-12-09/
java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -timeout 500000 &

Load Corpus (For both Gold Standard Corpora and Test Suite)

   python src/main.py load_corpus --goldstd hpo_train --log DEBUG
   python src/main.py load_corpus --goldstd hpo_test --log DEBUG
   python src/main.py load_corpus --goldstd tsuite --log DEBUG

Train, Test and Evaluate with StanfordNER

   python src/main.py train --goldstd hpo_train --models models/hpo_train --log DEBUG
   python src/main.py test --goldstd hpo_test -o pickle data/results_hpo_train --models models/hpo_train --log DEBUG
   python src/evaluate.py evaluate hpo_test --results data/results_hpo_train --models models/hpo_train --log DEBUG

Train, Test and Evaluate with CRFSuite

   python src/main.py train --goldstd hpo_train --models models/hpo_train --log DEBUG --entitytype hpo --crf crfsuite
   python src/main.py test --goldstd hpo_test -o pickle data/results_hpo_train --models models/hpo_train --log DEBUG --entitytype hpo --crf crfsuite
   python src/evaluate.py evaluate hpo_test --results data/results_hpo_train --models models/hpo_train --log DEBUG --entitytype hpo

Test and Evaluate for Test Suites

   python src/main.py test --goldstd tsuite -o pickle data/results_hpo_train --models models/hpo_train --log DEBUG --entitytype hpo --crf crfsuite
   python src/evaluate.py evaluate tsuite --results data/results_hpo_train --models models/hpo_train --log DEBUG --entitytype hpo

Rules can be added to the evaluation parameters:

   --rules andor stopwords small_ent twice_validated stopwords gowords posgowords longterms small_len quotes defwords digits lastwords

FAQ

How to run IHP in new, unlabeled, unstructured text?

Replace the sample corpus in corpora/hpo/test_corpus/ by the new, unlabeled, unstructured text and delete the content of corpora/hpo/test_ann/. Then run:

    python src/main.py load_corpus --goldstd hpo_test --log DEBUG
    python src/main.py test --goldstd hpo_test -o pickle data/results_hpo_train --models models/hpo_train --log DEBUG
    python src/evaluate.py evaluate hpo_test --results data/results_hpo_train --models models/hpo_train --log DEBUG

The report file in data/results_hpo_train_report.txt will have the generated annotations marked as false positives (because no annotation file was provided).

References:

M. Lobo, A. Lamurias, and F. Couto, “Identifying human phenotype terms by combining machine learning and validation rules,” BioMed Research International, vol. 2017, pp. 1--14, 2017 (https://doi.org/10.1155/2017/8565739)

Name		Name	Last commit message	Last commit date
Latest commit History 224 Commits
bin		bin
corpora		corpora
data		data
models		models
src		src
Dockerfile		Dockerfile
GSC+.rar		GSC+.rar
GSC_v2.rar		GSC_v2.rar
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
settings.json		settings.json
settings_base.json		settings_base.json

License

sdyz5210/IHP

Folders and files

Latest commit

History

Repository files navigation

IHP

Usage

Load Corpus (For both Gold Standard Corpora and Test Suite)

Train, Test and Evaluate with StanfordNER

Train, Test and Evaluate with CRFSuite

Test and Evaluate for Test Suites

FAQ

How to run IHP in new, unlabeled, unstructured text?

References:

About

Resources

License

Stars

Watchers

Forks

Languages