String-overlap Assembly of GEnomes
NOTE - Please use SAGE2 found here; SAGE is no longer maintained.
SAGE is a new string-overlap graph-based de novo genome assembler. To run SAGE, first correct the input dataset using RACER, then use the command:
SAGE [inputFile(s)] [outputDir] [minOverlapLength]
where
- SAGE is the appropriate binary used, e.g., SAGE_Linux
- [inputFile(s)] is the list of input files with (corrected) reads
- [outputDir] contains the assembly produced
- [overlapLength] is the length of the minimum overlap between reads
SAGE assumes all reads have the same length and the paired reads are interleaved.
The assemblies for the datasets in the paper can be found below:
- B.subtilis
- C.trachomatis
- S.pseudopneumonie
- F.tularensis
- L.interrogans
- P.gingivalis
- E.coli
- C.thermocellum
- C.elegans
If you use SAGE, please cite:
L. Ilie, B. Haider, M. Molnar, R. Solis-Oba, SAGE: String-graph Assembly of GEnomes, BMC Bioinformatics 15 (2014) 302.