SeqDistK-Pyhon

For Windows Users

For Windows Users, we recommend a Windows distribution with convenient graphical user interface:
https://github.com/htczero/SeqDistK. It is C++ based and much faster than this python distribution.

SeqDistK-Pyhon

Introduction

Phylogenetic tools are fundamental to studies of evolution and taxonomy. In this paper, we present SeqDistK, a novel tool for alignment-free phylogenetic analysis. SeqDistK batch computes the pairwise distance matrix between biological sequences, using seven popular k-mer based dissimilari-ty measures. Based on the matrix, SeqDistK constructs a phylogenetic tree using the Unweighted Pair Group Method with Arithmetic Mean algorithm. Using a golden-standard dataset of 16S rRNA sequences and the associated phylogenetic tree, we benchmarked the accuracy and efficiency of SeqDistK. We found the measure d2S (k=5, M=2) was the best, which correctly clustered and classified all sequences. Compared to multiple aligners such as Muscle, Clustalw2 and Mafft, SeqDistK was tens to hundreds of times faster, which helps eliminating the computation limit encountered by large-scale phylogenetic analysis.

Requirments

numpy 1.16
numba 0.43.1
tqdm
python 3.7

MiniConda is recommended. Using Anaconda3 is also ok.

Example

Start the program

python main.py

Suppose you have N input sequences file in a directory and the directory path is '/home/seqs'

Input the directory path

Input the directory path of sequences : /home/seq

Input the k, the size of k-mer, you want to compute (k > 0). See the reference paper for how to choose a measure for details.

For a single k, input a integer(>0), such as 4

Input the k : 4

For a range of k, input kmin-kmax-step. For example(without quotation marks), '2-10-2', which specifies k = [2, 4, 6, 8, 10]"

Input the k : 2-10-2

Choose the dissimilarity measure. See the reference paper for how to choose a measure for details.

0. Ma  
1. Ch  
2. Eu  
3. d2  
4. Hao  
5. d2S  
6. d2Star  
For example(without quotation marks), '1,2,3,4'  
Input the dissimilarities : 0,1,2,4,5

If in the step 3, d2S or d2Star was chosen, one also needs to give M, the order of Markov background model. See the reference paper for how to choose M for details.

For a single M, input a interger(>=0)

Input the possibility order : 2

For a series of M, separation them with ','. For example(without quotation marks), '0, 1, 2, 3'

Input the possibility order : 0,1,2

Input the path you want to save the results. For example, "/home/save"

Input the path you want to save : /home/save

Confirm the parameters are correct before submit the computaiton.

Check the parameters : 'yes' or 'no'  
yes  # input yes and press enter if the parameters are correct.

Manuals

Structure of working directory

For each dir_x, it can be seen as a case of single directory.

FAQ

Q1: Can I pause if the program is running?
A1: No

Q2: What is the range of k?
A2: K should be no more than 15 (<=15).

Q3: What is the difference between Windows version and this?
A3: For windows version, it use C# and has UI. Further more, using multi-threading, Windows version is more faster than python version.

Q4: Can I use it in MacOS?
A4: Of course.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
Core		Core
Data		Data
Dissimilarity		Dissimilarity
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Core

Core

Data

Data

Dissimilarity

Dissimilarity

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

main.py

main.py

Repository files navigation

For Windows Users

SeqDistK-Pyhon

Introduction

Requirments

Example

Manuals

Structure of working directory

FAQ

About

Releases

Packages

Languages

License

cxialab/SeqDistK-Pyhon

Folders and files

Latest commit

History

Repository files navigation

For Windows Users

SeqDistK-Pyhon

Introduction

Requirments

Example

Manuals

Structure of working directory

FAQ

About

Resources

License

Stars

Watchers

Forks

Languages