This script convert HPO ontology to a sqlite database as a tree according the nested set model. It takes HPO obo file as argument and construct a tree with networkx. The Tree is visited to compute left and right index and the tree is stored into a sqlite database using peewee ORM model. There are two main table: Terms and Nodes.
You need python3 with the following modules
pip install networkx
pip install obonet
pip install peewee
pip install tqdm
Download hpo obo file from http://purl.obolibrary.org/obo/hp.obo and run the script as follow
python import_hpo_obo.py hp.obo
## Import gene and disease
wget http://compbio.charite.de/jenkins/job/hpo.annotations.current/lastSuccessfulBuild/artifact/current/phenotype.hpoa
python import_disease.py phenotype.hpoa
wget http://compbio.charite.de/jenkins/job/hpo.annotations/lastSuccessfulBuild/artifact/util/annotation/phenotype_to_genes.txt
python import_gene.py phenotype_to_genes.txt
Terms contains each unique HPO term with name and definition.
fields | description |
---|---|
id | INTEGER "3" |
hpo | TEXT "HP:0000012" |
name | TEXT "Urinary urgency" |
definition | TEXT "Urinary urgency is ..." |
comment | TEXT |
Nodes table contains the HPO ontology as a Tree. Because hpo ontology is a Directed Acyclic Graph, the tree contains duplicate elements. Each node refere to his HPO terms from the term_id field.
fields | description |
---|---|
id | INTEGER (primary key) |
left | INTEGER (left index) |
right | INTEGER (right index) |
depth | INTEGER (depth of the node) |
term_id | INTEGER (term_id) |
Select all children of HPO term HP:0012632 (Abnormal intraocular pressure)
# Get the term id
SELECT id FROM terms WHERE hpo = "HP:0012632" // get 9124
# Get the node left and right
SELECT left,right FROM nodes WHERE term_id=9124 //get 99982 and 99987
# Select all children nodes
SELECT * FROM nodes WHERE left > 99982 AND right < 99987
All steps can be summarized in one query :
SELECT terms.hpo, terms.name FROM nodes
INNER JOIN (SELECT left, right FROM nodes WHERE term_id = (SELECT id FROM terms WHERE hpo = "HP:0012632")) as root ON nodes.left > root.left AND nodes.right < root.right
INNER JOIN terms ON terms.id = nodes.term_id
You can specify the depth if you only want to first level childs.
SELECT terms.hpo, terms.name FROM nodes
INNER JOIN (SELECT left, right FROM nodes WHERE term_id = (SELECT id FROM terms WHERE hpo = "HP:0012632")) as root ON nodes.left > root.left AND nodes.right < root.right AND (nodes.depth - root.depth) < 2
INNER JOIN terms ON terms.id = nodes.term_id