Skip to content

daler/seqprint

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

seqprint

Flexible and extensible pretty-printing of genomic sequences, given a bed file and a fasta file.

Includes numbers with tick marks and sequence, and then applies any additional user-specified functions to generate additional "tracks".

Originally written to help identify single bp positions within overlapping CTCF binding sites to choose for experimental follow-up, but then generalized for other uses.

See the docstring for seqprint.seqprinter for more info on subclassing.

Example usage (uses MotifPrinter, which is a subclass of BasePrinter):

>>> from seqprint import MotifPrinter
>>> from seqprint.helpers import data_file
>>> # get example data
>>> regions = data_file('regions.bed')
>>> fasta = data_file('chr11_subset.fa')
>>> jaspar_file = data_file('ctcf.jaspar')
>>> jaspar_thresh = 1.5
>>> x = MotifPrinter(regions, fasta, jaspar_file=jaspar_file,
...     jaspar_thresh=jaspar_thresh)
>>> x.printseq()

Example output:

region1 chr11:0-200
0         10        20        30        40        50        60        70        
|         |         |         |         |         |         |         |         
AGGGCAAAGATGGAAGTTTAAAGCCAGCCATTTCTAAGGGTTAGCGGCTTGCTCAATTCCCTGGGGGCCTGGCATATCTA
................................................................................
................................................................................
................................................................................
................................................................................
................................................................................
................................................................................
................................................................................
......................cccTTAgAAAtgGCTggc (1.58 -)...............................
................................................................................
........tggCTTtAAActTCCatc (1.53 -).............................................
..................tagAAAtGGCtgGCTtta (1.51 -)...................................
................................................................................

80        90        100       110       120       130       140       150       
|         |         |         |         |         |         |         |         
GTATGGCCAGGAGATGGCAGTGTTGAAGCATCTTCTGTTAGTAAAACACATCCCTGTCTCTCAGAGCCCCAGAGATAGGG
...............................................................accCTAtCTCtgGGGct
................................................................................
...tggCCAggAGAtGGCagt (1.69 +)..................................................
...................................................gggCTCtGAGagACAggg (1.62 -)..
......................................................tgtCTCtcAGAgCCCcag (1.61 +
..........................ttaCTAaCAGaaGATgct (1.60 -)...........................
.tgcCATcTCCtgGCCata (1.59 -)....................................................
................................................................................
................................................................agcCCCagAGAtAGGg
................................................................................
................................................................................
............................................aacACAtcCCTgTCTctc (1.50 +).........

160       170       180       190       
|         |         |         |         
TTTATCTCGTTCTCACTTATTTGACAAAGAAAAAGGACAC
c (1.76 -)..............................
.......gtcAAAtAAGtgAGAacg (1.72 -)......
........................................
........................................
).......................................
........................................
........................................
........................................
tt (1.56 +).............................
........................................
........................................
........................................

About

pretty-print genomic sequences

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages