Skip to content

gitter-badger/goenrich

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

goenrich

Convenient GO enrichments from python. For use in python projects.

  1. Builds the GO-ontology graph
  2. Propagates GO-annotations up the graph
  3. Performs enrichment test for all categories
  4. Performs multiple testing correction
  5. Allows for export to pandas for processing and graphviz for visualization

Supported ids: Uniport ACC, Entrez GeneID

Installation

Install package from pypi and download ontology and needed annotations.

pip install goenrich
mkdir db
# Ontology
wget http://purl.obolibrary.org/obo/go/go-basic.obo -O db/go-basic.obo
# UniprotACC
wget http://geneontology.org/gene-associations/gene_association.goa_ref_human.gz -O db/gene_association.goa_ref_human.gz
# Entrez GeneID
wget ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2go.gz -O db/gene2go.gz

Run GO enrichment

import goenrich

# build the ontology
G = goenrich.obo.graph('db/go-basic.obo')

# use all entrez geneid associations form gene2go as background
# use goenrich.read.goa('db/gene_association.goa_ref_human.gz') for uniprot
background = goenrich.read.gene2go('db/gene2go.gz')
goenrich.enrich.set_background(G, background, 'GeneID', 'GO_ID')

# extract some list of entries as example query
query = set(background['GeneID'].unique()[:20])

# run analysis and obtain results
result = goenrich.enrich.analyze(G, query)

# for additional export to graphviz just specify the gvfile argument
# the show argument keeps the graph reasonably small
result = goenrich.enrich.analyze(G, query, gvfile='example.dot', show='top20')

The first few rows of the resulting table are:

name x p q namespace
term
GO:0044877 macromolecular complex binding 2 3.422658e-02 0.034227 molecular_function
GO:0000149 SNARE binding 2 1.041071e-05 0.000092 molecular_function
GO:1901700 response to oxygen-containing compound 2 1.088637e-02 0.014640 biological_process
GO:0050801 ion homeostasis 2 1.653091e-03 0.003393 biological_process
GO:0051353 positive regulation of oxidoreductase activity 2 2.439696e-07 0.000010 biological_process

Generate png image using graphviz

dot -Tpng example.dot > example.png

example

Parameters

Parameters can all be passed to enrich.analyze as shown below

go_options = {
        'multiple-testing-correction' : 'bonferroni',
        'alpha' : 0.05,
        'node_filter' : lambda x : x.get('significant', False)
}
goenrich.enrich.analyze(G, query, **go_options)

# export results to graphviz
goenrich.enrich.analyze(G, query, gvfile='example.dot', **go_options)

Here is an overview over the available parmeters

read.*:
  experimental = True # don't consider inferred annotations

enrich.analyze:
  node_filter = lambda node : 'p' in node
  show = 'top20' # works for any 'topNUM'

enrich.calculate_pvalues:
  min_hit_size = 2
  min_category_size = 3
  max_category_size = 500
  max_category_depth = 5

enrich.multiple_testing_correction:
  alpha = 0.05
  method = 'benjamin-hochberg' # also supported : 'bonferroni'

export.to_frame:
  node_filter = lambda node: True

export.to_graphviz:
  graph_label = None # if None it is replaced by multiple testing info

Licence

This work is licenced under the MIT licence

Contributions are welcome!

About

GO enrichment with python -- pandas meets networkx

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 100.0%