dib-MMETSP

Output files available for download:

Transcriptome assemblies (fasta):

Annotations (gff):

Table of one annotation name (best = sorted by e-value < 1e-05) by transcript ID (.csv):

Peptide translations (fasta):

Expression quantification (salmon output):

All files combined:

Pipeline scripts:

Citation:

Johnson, Lisa K., Alexander, Harriet, & Brown, C. Titus. (2018). MMETSP re-assemblies [Data set]. Zenodo. https://doi.org/10.5281/zenodo.740440

MMETSP pipeline

This respository contains the pipeline code used to generate re-assemblies of the Marine Microbial Eukaryote Transcriptome Sequencing Project (MMETSP). Originally: https://github.com/ljcohen/MMETSP

This pipeline was constructed to automate the eel pond khmer protocols over a large-scale RNAseq data set. The data set used is from the Marine Microbial Eukaryotic Transcriptome Sequencing Project (MMETSP), which contains 678 cultured samples of 306 pelagic and endosymbiotic marine eukaryotic species representing more than 40 phyla (Keeling et al. 2014).

Input file is SraRunInfo.csv, a metadata spreadsheet downloaded from NCBI-SRA that contains the url and sample ID information. Scripts were designed for the high performance computing cluster at Michigan State University, iCER, and will be launched in parallel through the portable batch system (PBS) scheduler. Scripts will use the SraRunInfo.csv metadata spreadsheet to download and extract data, run qc, trim, diginorm, then assemble using Trinity. If you are interested in using these scripts, please be aware that modifications will be required specific to the system you are using.

The main pipeline scripts in this repository:

getdata.py, download data from NCBI and organize into individual directories for each sample/accession ID
trim_qc.py, trim reads for quality, interleave reads
diginorm_mmetsp.py, normalize-by-median and filter-abund from khmer, rename, combined orphans
assembly.py, runs Trinity de novo transcriptome assembly software

Annotation and expression counts (run separately):

dammit.py, annotation https://github.com/camillescott/dammit/tree/master/dammit
salmon.py, runs salmon reference-free transcript quantification https://github.com/COMBINE-lab/salmon

Additional scripts (run separately):

rapclust.py, clustering contigs https://github.com/COMBINE-lab/rapclust
busco.py, assessing assembly and annotation completeness with single-copy orthologs http://busco.ezlab.org/
clusterfunc.py, cluster control module
sourmash.py, MinHash signatures to cluster unassembled reads https://github.com/dib-lab/sourmash/tree/v0.9.4
transdecoder.py, translate nucleotide contigs to amino acid contigs http://transdecoder.github.io/
transrate.py, evaluate assembly with reads http://hibberdlab.com/transrate/
transrate_reference.py, evaluate assembly with reference assembly http://hibberdlab.com/transrate/

Usage:

Clone this repo

git clone https://github.com/dib-lab/dib-MMETSP.git

edit dibMMETSP_configuration.py with absolute path names specific to your system. The file SraRunInfo.csv was obtained from NCBI for NCBI Bioproject accession: PRJNA231566. This set of code could be used with SraRunInfo.csv input from any collection of SRA records from NCBI or ENA.
Run the main python function

python main.py

References

Keeling et al. 2014: http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1001889

Supporting information with methods description: http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1001889#s6

Preliminary assembly protocol run by NCGR: https://github.com/ncgr/rbpa

MMETSP website: http://marinemicroeukaryotes.org/

iMicrobe project with data and combined assembly downloads: ftp://ftp.imicrobe.us/projects/104/

Blog posts: https://monsterbashseq.wordpress.com/2016/09/13/mmetsp-re-assemblies/

http://ivory.idyll.org/blog/2016-mmetsp-a-first-look.html

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.gitignore		.gitignore
README.md		README.md
SraRunInfo.csv		SraRunInfo.csv
assembly_trinity_2.2.0.py		assembly_trinity_2.2.0.py
assembly_trinity_20140413p1.py		assembly_trinity_20140413p1.py
clusterfunc.py		clusterfunc.py
dibMMETSP_configuration.py		dibMMETSP_configuration.py
diginorm_mmetsp.py		diginorm_mmetsp.py
getdata.py		getdata.py
main.py		main.py
mmetsp_pipeline1.png		mmetsp_pipeline1.png
trim_qc.py		trim_qc.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

README.md

README.md

SraRunInfo.csv

SraRunInfo.csv

assembly_trinity_2.2.0.py

assembly_trinity_2.2.0.py

assembly_trinity_20140413p1.py

assembly_trinity_20140413p1.py

clusterfunc.py

clusterfunc.py

dibMMETSP_configuration.py

dibMMETSP_configuration.py

diginorm_mmetsp.py

diginorm_mmetsp.py

getdata.py

getdata.py

main.py

main.py

mmetsp_pipeline1.png

mmetsp_pipeline1.png

trim_qc.py

trim_qc.py

Repository files navigation

dib-MMETSP

Output files available for download:

Citation:

MMETSP pipeline

Usage:

References

About

Releases

Packages

Languages

smsaladi/dib-MMETSP

Folders and files

Latest commit

History

Repository files navigation

dib-MMETSP

Output files available for download:

Citation:

MMETSP pipeline

Usage:

References

About

Resources

Stars

Watchers

Forks

Languages