Skip to content

jueshengong/psytrans

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

54 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NAME
       psytrans.py Parasite & Symbiont Transcriptome Separation

SYNOPSIS
       python psytrans.py [QUERIES] [-H FILE] [-S FILE] [OPTIONS]
       python psytrans.py [QUERIES] [-b BLASTRESULTSFILE] [OPTIONS]

DESCRIPTION
       psytrans.py separates the sequences of a host species from those of its main symbiont(s) or parasite(s) based on Support Vector Machine classification.
       The program takes as input a file in fasta format with the sequences to be classified.
       The program also requires a file with sequences of a species related to the host, and a file with sequences related to the symbiont (or parasite).
       The queries will be compared to these two files using BLASTX.
       Alternatively, the user can provide the output of pre-computed BLASTX searches (in tabular format: -outfmt 6 or 7).
       The classification is then carried out using the command line tools from libsvm.

DEPENDENCIES
       psytrans requires makeblastdb and blastx from the NCBI blast+ distribution, unless the user provides pre-computed blast results.
       psytrans also requires a few command line utilities from libsvm: svm-scale, svm-train and svm-predict

OPTIONS
   Generic Program Information

       -h, --help
              Print a usage message briefly summarizing the command-line options.

   Global options

       -R, --restart
              Restart the script from the last checkpoint.

       -p, --nbThreads
              Number of threads to use for the blast searches and for the SVM training.

       -V, --verbosemode
              Runs the script in verbose mode.

       -t, --tempDir
              Specify the name of the temporary directory.

       -X, --clearTemp
              Clears all temporary data in the temporary directory upon completion.

       -z, --stopAfter
              Choices:['db','runBlast','parseBlast','kmers','SVM']
              This option allows the user to choose whether the process should stop, once the process has completed a specific stage.
              db refers to the database creation stage;
              runBlast refers to the BLAST search stage;
              parseBlast refers to the separation of unambiguous and ambiguous sequence stage;
              kmers refers to the preparation of SVM input stage;
              SVM refers to the SVM training and testing stage.

   Preparation of training set options

       -e, --maxBestEvalue
              Set the maximum value for the best e-value to be used to classify unambiguous sequences.

       -n, --numberOfSeq
              Set the maximum number of training & testing sequences.

   Kmer parameters

       -c, --minWordSize
              Set the minimum value of DNA word length.

       -k, --maxWordSize
              Set the maximum value of DNA word length.

EXAMPLE
       You start with an assembly containing a mixture of sequences from a host A and a symbiont (or parasite) B: host_and_symb.fasta
       You also provide a file with proteins from a species related to the host: related_host_proteins.fasta
       and a file with proteins from a species related to the symbiont: related_symb_proteins.fasta
       You can then start the Psytrans process as follows:

              python psytrans.py host_and_symb.fasta  -H related_host_proteins.fasta -S related_symb_proteins.fasta

       Wait a few hours (depending on the number of sequences), and in the current directory you will find a new file starting with the prefix `host_' and another file starting with the prefix `symb_', corresponding to the sequences classified as host or symbiont (or parasite) respectively.

       To run the program using 8 threads and using /home/user/tmp as a temporary directory, use the following command:

              python psytrans.py host_and_symb.fasta  -H related_host_proteins.fasta -S related_symb_proteins.fasta -p 8 -t /home/user/tmp

AUTHOR
       Written by Sylvain Forêt and Jue-Sheng Ong.

REPORTING BUGS
       Report bugs at sylvain.foret@anu.edu.au
       Psytrans repository <https://github.com/sylvainforet/psytrans>

COPYRIGHT
       Copyright © 2014 Sylvain Forêt & Jue-Sheng Ong.

       psytrans  is a free  software and comes with ABSOLUTELY NO WARRANTY.  You are welcome to redistribute it under the terms of the GNU General Public License
       versions 3 or later.  For more information about these matters see http://www.gnu.org/licenses/.

About

Parasite Symbiont Transcriptome Separation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages