FerrisTools

is a collection of small command-line tools for manipulating FASTA files, developed for use by researchers at Children's Hospital New Orleans. Some of the scripts perform small general operations, others are intended to integrate our workflow with qiime more smoothly.

create_mapping.py

USAGE:

python create_mapping.py -m MappingFile.txt -o NewMapping.txt

suggested workflow:

run this script as shown above

run check_id_map.py:

  macqiime check_id_map.py -m NewMapping.txt -o checkmap -j run_prefix

check log of check_id_map.py for relevant errors:

  grep -v "Removed bad chars" checkmap/NewMapping.log

take subsets of .fna and .qual as necessary to render them isomorphic

run split_libraries.py:

  macqiime split_libraries.py -e 0 -m checkmap/NewMapping_corrected.txt -f MySeqs.fna -q MyQual.qual -o splib-out -j run_prefix -b 8

create_mapping adds a 'run_prefix' column to the mapping file, allowing qiime's split_libraries.py to demultiplex reads by an already determined sample name as well as the barcode. This is useful in situations where the sequencing facility has already labeled the reads by sample

fasta.py :

USAGE:

python fasta.py keyfile.txt fastafile.fna [--liberal | -l]

Runs QA steps to remove primers, barcodes, homopolymers and chimeras from the data.

ARGS:

keyfile.txt: the "keyfile" or mapping file.
fastafile.fna: the file to be preprocessed
--liberal or -l: see odd cases below. If this flag is not provided, the default is 'conservative' mode.

Odd cases:

sequence id not in keyfile: throw out sequence, warn
sequence does not start with barcode: ignore
sequence does not match primer (maybe primer was already stripped):
- conservative mode - throw out
- liberal mode - ignore (whole operation is idempotent in liberal mode)

seq_subset.py :

USAGE:

python seq_subset.py <fastafile> namestems

Where namestems is a list in quotes, with entries separated by commas, e.g. "JN031811-1, JN031811-2, Jn031811-3"

--or--

python seq_subset.py <fastafile> -f stemsfile

Where stemsfile is a the path to a file containing one list entry per line

any sequences in the fasta file whose names begin with one of the entries in the list will be printed

fnaview.py :

USAGE:

python fnaview.py fastafile.fna

All sample IDs in fastafile will be printed once each. to count the number of samples in a file, use like so:

python fnaview.py fastafile.fna | wc -l

To check whether two files, (or a fasta and a qual file) have the same sample names in them, do this:

python fnaview.py fastafile.fna | sort > f1.fna
python fnaview.py other_fastafile.fna | sort > f2.fna
diff f1.fna f2.fna

if the diff command produces no output, the two files contain the same set of samples.

USAGE:

checkseqs.py seqs.fna

Where seqs.fna is the fasta file generated by split_libraries.py.

The script checks that the sample ID assigned by qiime matches the original, and prints any mismatches. You may wish to pipe the output to a file, as if there is one mismatch there will likely be many.

The script prints no output if it finds no mismatched sequences.

To see the number of mismatches, use the script like so:

python checkseqs.py seqs.fna > mismatches.fna

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

README.markdown

README.markdown

checkseqs.py

checkseqs.py

create_mapping.py

create_mapping.py

fasta.py

fasta.py

fnaview.py

fnaview.py

seq_subset.py

seq_subset.py

Repository files navigation

FerrisTools

create_mapping.py

USAGE:

suggested workflow:

fasta.py :

USAGE:

ARGS:

Odd cases:

seq_subset.py :

USAGE:

fnaview.py :

USAGE:

USAGE:

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.gitignore		.gitignore
README.markdown		README.markdown
checkseqs.py		checkseqs.py
create_mapping.py		create_mapping.py
fasta.py		fasta.py
fnaview.py		fnaview.py
seq_subset.py		seq_subset.py

theJohnnyBrown/ferristools

Folders and files

Latest commit

History

Repository files navigation

FerrisTools

create_mapping.py

USAGE:

suggested workflow:

fasta.py :

USAGE:

ARGS:

Odd cases:

seq_subset.py :

USAGE:

fnaview.py :

USAGE:

USAGE:

About

Resources

Stars

Watchers

Forks

Languages