Flexible and extensible pretty-printing of genomic sequences, given a bed file and a fasta file.
Includes numbers with tick marks and sequence, and then applies any additional user-specified functions to generate additional "tracks".
Originally written to help identify single bp positions within overlapping CTCF binding sites to choose for experimental follow-up, but then generalized for other uses.
See the docstring for seqprint.seqprinter for more info on subclassing.
Example usage (uses MotifPrinter, which is a subclass of BasePrinter):
>>> from seqprint import MotifPrinter
>>> from seqprint.helpers import data_file
>>> # get example data
>>> regions = data_file('regions.bed')
>>> fasta = data_file('chr11_subset.fa')
>>> jaspar_file = data_file('ctcf.jaspar')
>>> jaspar_thresh = 1.5
>>> x = MotifPrinter(regions, fasta, jaspar_file=jaspar_file,
... jaspar_thresh=jaspar_thresh)
>>> x.printseq()
Example output:
region1 chr11:0-200
0 10 20 30 40 50 60 70
| | | | | | | |
AGGGCAAAGATGGAAGTTTAAAGCCAGCCATTTCTAAGGGTTAGCGGCTTGCTCAATTCCCTGGGGGCCTGGCATATCTA
................................................................................
................................................................................
................................................................................
................................................................................
................................................................................
................................................................................
................................................................................
......................cccTTAgAAAtgGCTggc (1.58 -)...............................
................................................................................
........tggCTTtAAActTCCatc (1.53 -).............................................
..................tagAAAtGGCtgGCTtta (1.51 -)...................................
................................................................................
80 90 100 110 120 130 140 150
| | | | | | | |
GTATGGCCAGGAGATGGCAGTGTTGAAGCATCTTCTGTTAGTAAAACACATCCCTGTCTCTCAGAGCCCCAGAGATAGGG
...............................................................accCTAtCTCtgGGGct
................................................................................
...tggCCAggAGAtGGCagt (1.69 +)..................................................
...................................................gggCTCtGAGagACAggg (1.62 -)..
......................................................tgtCTCtcAGAgCCCcag (1.61 +
..........................ttaCTAaCAGaaGATgct (1.60 -)...........................
.tgcCATcTCCtgGCCata (1.59 -)....................................................
................................................................................
................................................................agcCCCagAGAtAGGg
................................................................................
................................................................................
............................................aacACAtcCCTgTCTctc (1.50 +).........
160 170 180 190
| | | |
TTTATCTCGTTCTCACTTATTTGACAAAGAAAAAGGACAC
c (1.76 -)..............................
.......gtcAAAtAAGtgAGAacg (1.72 -)......
........................................
........................................
).......................................
........................................
........................................
........................................
tt (1.56 +).............................
........................................
........................................
........................................