Skip to content
forked from GDKO/SCUBAT2

Scaffolding contigs with transcripts

License

Notifications You must be signed in to change notification settings

photocyte/SCUBAT2

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 

Repository files navigation

SCUBAT2

Overview

SCUBAT2 (Scaffolding Contigs Using BLAST And Transcripts v2) uses transcriptome or proteome information to scaffold the genome. It was inspired by the original SCUBAT algorithm by Ben Elsworth.

Requirements

Python Libraries

Biopython - to parse BLAST XML file

Numpy - to calculate some statistics

Details

Requires a BLAST XML file

blastn -query transcripts.fa -db contigs.fa -evalue 1e-25 -outfmt 5 -out blast.xml

For the same species the default settings for identity cutoff should be okay

The user must specify the max allowed intron size (i.e for nematode species ~ 20000 bp). Alternatively the user can run the program with --intron_size_run that creates the file intron_size which has the intron sizes calculated by the mapped transcripts

Example command

SCUBAT_v2.py -b [blast.xml] -f [assembly.file] -max 20000

About

Scaffolding contigs with transcripts

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%