SATIVA (Semi-Automatic Taxonomy Improvement and Validation Algorithm) is a pipeline that uses Evolutionary Placement Algorithm (EPA, [1]) to identify taxonomically mislabeled sequences and suggest corrections. Internally, SATIVA relies on RAxML [2] for likelihood computations as well as on the ETE library[3] for tree topology manupulations in Python.
Currently, only Linux and OSX (Mac) systems are supported.
-
Make sure Python 2.6+ is installed (Python 3 is not supported!)
-
Make sure you have a recent C compiler (we recommend GCC 4.6+ / clang 3.3+ for AVX support). If you have an up-to-date OS distribution (Ubuntu 12.04+, OSX 10.8+ etc.), there is nothing to worry about. In a cluster environment, you might need to select an appropriate compiler version, e.g.:
module load gcc/4.7.0
(please refer to your cluster documentation for details)
-
Run the installation script
./install.sh
If you are getting compilation errors, try to disable AVX:
./install.sh --no-avx
SATIVA requires two files as an input: alignment (FASTA or PHYLIP) and a text file with taxonomic annotations (matched by sequence name). Furtermore, you must choose the nomenclature code via the -x option (e.g., BAC(teriological) for Bacteria and Archaea).
Sample command line to run SATIVA with 2 threads:
cd example
../sativa.py -s test.phy -t test.tax -x BAC -T 2
Output is a text file which contains a list of identified mislabels, along with the corresponding confidence scores and proposed taxonomic corrections.
NOTE: If you omit the -T
parameter, SATIVA will start one thread per each logical CPU
in your system. Although this is usually what you want, it might lead to a major slowdown
if some of the CPUs are already reserved by other running programs (e.g., if you run SATIVA on
a shared server). If you encounter this problem, please try reducing the number of threads with -T
!
For additional options, please refer to the online help:
./sativa.py -h
SATIVA is integrated with the most recent (unstable) version of ARB software.
Development builds: ftp://ftp.arb-silva.de/ARB/builds/
Source: http://svn.mikro.biologie.tu-muenchen.de/readonly/trunk/
For the time being, please direct your questions to the RAxML google group:
https://groups.google.com/forum/?hl=en#!forum/raxml
Alexey M. Kozlov, Jiajie Zhang, Pelin Yilmaz, Frank Oliver Glöckner and Alexandros Stamatakis. Phylogeny-aware Identification and Correction of Taxonomically Mislabeled Sequences. Submitted. bioRxiv preprint
[1] Berger, S. A., Krompass, D., and Stamatakis, A. (2011) Performance, Accuracy, and Web Server for Evolutionary Placement of Short Sequence Reads under Maximum Likelihood. Systematic Biology, 60(3), 291–302. doi:10.1093/sysbio/syr010
[2] Stamatakis A. (2014) RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics, 30(9): 1312-1313. doi:10.1093/bioinformatics/btu033
[3] Huerta-Cepas, J., Dopazo, J., and Gabaldon, T. (2010) ETE: a python Environment for Tree Exploration. BMC bioinformatics, 11(1), 24. doi:10.1186/1471-2105-11-24