This is a simple program that I use to query the NCBI taxonomy tree. It requires the ETE python library (ete.cgenomics.org) and sqlite3 to work. Features are still very rudimentary. Please, refer/cite this repository if you use the program.
April 17th 2012 * Name searches are not case sensitive * Added synonym support for name translation * Added fuzzy search
- ETE (ete.cgenomics.org)
- sqlite3
Fuzzy search (optional) requires:
python 2.7 or pysqlite2 compiled with load_extension capabilities. For instance, download pysqlite2 from PyPi, comment the following line in setup.cfg:
define=SQLITE_OMIT_LOAD_EXTENSION
and run "sudo python setup.py install".
Also, the sqlite3 extension "levenshtein.sqlext" should be present, so you will need to compile the SQLite extension included in this package.
$ cd SQLite-Levenshtein $ make
$ wget ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz
$ tar zxf taxdump.tar.gz
$ python update_taxadb.py # This may take a while
$ python ./ncbi_query.py -h
$ python ./ncbi_query.py -n Bos taurus, Gallus gallus, Homo sapiens
$ python ./ncbi_query.py -n Bos taurus, Gallus gallus, Homo sapiens -x
$ python ./ncbi_query.py -n Bos taurus, Gallus gallus, Homo sapiens -i
$ python ./ncbi_query.py -t 9913 9031 9606 -x
fuzzy factor indicates the allowed level of similarity to report matches. i.e: 0.8 would mean that only matches changing 80% of the characters in the original string will be considered (case sensitive).
$ python ./ncbi_query.py -n Bos tauras, gallus, Homo sapien --fuzzy 0.8
Contact: jhcepas[at]gmail.com