Libraries to help process Metataxonomy data, specifically NCBI gid, accession_numbers, and taxon ID.
I have taken all the scripts I have written and created libraries for my own use. And why not share?
- I use these libraries to parse alignments against metagenome databases, for example results from IMSA-A (my project on github).
- Accepts alignment files blast format6 (BLAST) or sam (BWA, BOWTIE2, etc) or psl (BLAT).
- Generates useful alignment statistics for all file types; enables comparative analyses.
- Use Genbank IDs (gid) or NCBI Accession numbers. Note that NCBI has deprecated the use of gid.
- Uses local data files improving speed over Eutils.
- Convert taxon ID to other taxons in the lineage
- Create cladograms
- Split a fasta file into chunks (by line count)
- (more)
- List of helpful bash tricks
- Written for Python 2.7.
- I recommend using Anaconda when running on Windows.
- ETE2 (etetoolkit.org) :: for making cladograms only. I use ete2-2.2.1072-py2.7.egg, which is years out of date. (Because I didn't need new features)
Project creation 2018-01-24 README.md drafted