Skip to content

Scaffolding of genomic assemblies with RNA seq data

Notifications You must be signed in to change notification settings

standardgalactic/BESST_RNA

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 

Repository files navigation

BESST_RNA

Scaffolding of genomic assemblies with RNA seq data.

INPUT:

Required arguments:

  • -c Path to a contig file.

  • -f Path to bamfile of RNA-seq data mapped to assembly.

  • -o Path to location for the output.

Optional arguments

  • -e The least amount of witness links that is needed to create a link edge in graph (one for each library).

  • -w real weight parameter. Given multiple edges from a given contig(-end) inference on which one to be chosen needs to be done. -w is the minimum threshold of the relative weight for the dominating edge (most amount of links) to the second most dominating edge for a given contig-end. If weight is less than specified -w, no edge will be chosen to extend this contig. Default value is 3.

  • -T Upper threshold on the distance between RNA reads. RNA reads with distance over this threshold is not considered to be mapped correctly. e.g. could be set to the upper estimated intron size.

  • -k Minimum size of contig to be included in the scaffolding.

  • -d Check for sequencing duplicates and count only one of them (when computing nr of links) if they are occurring. (default on = 1).

  • -z Coverage cutoff for repeat classification > Should be set high (i.e repeat detection turned of) for RNA scaffolding since repeat classification is not trusted with this type of data. [ e.g. -z 1000 says that contigs with coverage over 100 will be discarded from scaffolding.]

  • --mapq Lowest mapping quality allowed in order to use the read. This value is compared to the mapping quality column in the BAM file. Default = 10.

EXAMPLE RUN:

From //BESST_RNA/src/ , run

$ python Main.py 1 -c /path/to/contigfile.fa -f /path/to/bamfile -o /path/to/output -e 3 -T 20000 -k 500 -d 1 -z 1000

NOTE:

Mapping reads: BESST assumes "PE orientation" the following mapping of RNA reads:

PE: s t ------> <-------

About

Scaffolding of genomic assemblies with RNA seq data

Resources

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.2%
  • Shell 0.8%