GitHub - brwnj/sequence_clustering: De novo identification and quantification of sequence data.

#Sequence clustering pipeline

De novo identification and quantification of sequence data utilizing edit distance as a metric to group similar sequences. Sequences are paired longest to shortest.

There are many algorithms to accomplish a similar goal and many that better account for sequence identity matching without the bias using longer sequences first.

usage: sequence_clustering.py [-h] [-v] {consensus,quantify,matrix} ...

De novo identification and quantification of sequence data.

positional arguments:
  {consensus,quantify,matrix}
                        commands
    quantify            quantify unique and similar sequences
    consensus           build observed sequence library
    matrix              generate counts matrix

optional arguments:
  -h, --help            show this help message and exit
  -v, --version         show program's version number and exit

##Quantifying sequences in FASTQ

usage: sequence_clustering.py quantify [-h] [-c CUTOFF] [-m MISMATCH] FASTQ

Find and quantify unique and similar sequences within a FASTQ.

positional arguments:
  FASTQ        reads to process

optional arguments:
  -h, --help   show this help message and exit
  -c CUTOFF    minimum allowable seq length (default: 18)
  -m MISMATCH  mismatch tolerance when grouping bins (default: 3)

##Finding consensus of observed sequences

usage: sequence_clustering.py consensus [-h] [-c CUTOFF] [-m MISMATCH]
                                        BINS [BINS ...]

Build consensus of sequences across all samples.

positional arguments:
  BINS         results of `quantify`

optional arguments:
  -h, --help   show this help message and exit
  -c CUTOFF    minimum allowable count (default: 100)
  -m MISMATCH  mismatch tolerance when grouping bins (default: 3)

##Counts of observed sequences across consensus bins

usage: sequence_clustering.py matrix [-h] [-c CUTOFF] [-m MISMATCH]
                                     [-n {deseq,totalcount}]
                                     CONSENSUS COUNTS [COUNTS ...]

Generate counts matrix

positional arguments:
  CONSENSUS             result of `consensus`
  COUNTS                results of `quantify`

optional arguments:
  -h, --help            show this help message and exit
  -c CUTOFF             minimum allowable count for individual sample
                        sequences (default: 100)
  -m MISMATCH           mismatch tolerance when grouping bins (default: 3)
  -n {deseq,totalcount}
                        output normalized table using either DESeq or total
                        count method (default: None)

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
sequence_clustering.py		sequence_clustering.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

sequence_clustering.py

sequence_clustering.py

Repository files navigation

About

Releases

Packages

Languages

License

brwnj/sequence_clustering

Folders and files

Latest commit

History

Repository files navigation

About

Resources

License

Stars

Watchers

Forks

Languages