Skip to content

nhoffman/alnvu

Repository files navigation

alnvu

alnvu makes a multiple alignment of biological sequences more easily readable by condensing it and highlighting variability.

Produces formatted multiple alignments in plain text, html, and pdf.

image

authors

  • Noah Hoffman
  • Chris Small
  • Connor McCoy
  • Tim Holland

dependencies

Required:

Optional:

installation

Installation is easiest using pip:

pip install alnvu

To install from the sources on GitHub, first clone the repository.

Using setup.py:

cd alnvu
python setup.py install

Using pip:

cd alnvu
pip install .

examples

% cd alnvu

The default output. Note that columns are numbered (column 8 is the first shown, column 122 is the last):

% ./av -w 80 testfiles/aln.fasta | head -n 15
     # 00000000000000000000000000000000000000000000000000000000000000000000000000000000
     # 00000000000000000000000000000000000000000000000000000000000000000000000000000000
     # 00111111111122222222223333333333444444444455555555556666666666777777777788888888
     # 89012345678901234567890123456789012345678901234567890123456789012345678901234567
 59735 agagtttgatcctggctcaggacgaacgcTGGCGGCgtGCTTAACACATGCAAGTCGAACGaTgAAgcggtGCTTgcacc
 70875 --------------------------------------------------------------------------------
 58095 agagtttgatcctggctcagagcgaacgcTGGCGGCatGCTTAACACATGCAAGTCGcACGGgtggtttcgGCcatc---
 70854 -----------------------------TGGCGGCagGCcTAACACATGCAAGTCGAgCGGatgAcgggAGCTTgctcc
 62024 agagtttgatcctggctcaggacgaacgcTGGCGGCgtGCTTAACACATGCAAGTCGAACGaTgAAgcctttCggggtgg
 59895 ---------------------------------------------------AAGTCGAACGGTgAAagagAGCTTagctc
 57728 -----------------------------------------------------------tt-------------------
 10734 ---------------------------gcTGaCGGCgtGCTTAACACATGCAAGTCGAACGGgatccattAGCgcttttg
 71041 --------------------------cgcTGGCGGCagGCTTAACACATGCAAGTCGAACGaTgAAgtctAGCTTgctag
 6161O -----------------------------TGGCGGt-gGCcTAACACATGCAAGTCGAACGGatccttcggGaTT-----

The input file can be provided via stdin:

% cat testfiles/aln.fasta | av

Exercising some of the options (show sequence numbers and a consensus; show differences with first sequence, restrict to columns 200-280):

% av testfiles/aln.fasta --number-sequences --consensus --range 200,280 --compare-to 59735
        # 000000000000000000000000000000000000000000000000000000000000000000000000000000000
        # 222222222222222222222222222222222222222222222222222222222222222222222222222222222
        # 000000000011111111112222222222333333333344444444445555555555666666666677777777778
        # 012345678901234567890123456789012345678901234567890123456789012345678901234567890
 1 ==REF==> 59735 TGGGGtG-TTGGTgGAAAGCgttatgga------------GTGGTTTTAGATGGGCTCACGGCCTATCAGCTTGTTGGTGA
 2          70875 gGGt----------GAAAGtGggggaccgcaaggcctc--acGcagcagGAgcGGCcgAtGtCtgATtAGCTaGTTGGTGg
 3          58095 gGcc----------cAAAGCcgaAaG--------------GcGccTTTgGAgcGGCctgCGtCCgATtAGgTaGTTGGTGg
 4          70854 gGGa----------GAAAGCaggggaccttcgggcctt--GcGcTaTcAGATGaGCctAgGtCggATtAGCTaGTTGGTGg
 5          62024 TGGGa-c-ggGGTtaAAAGCtccg----------------GcGGTgaagGATGaGCcCgCGGCCTATCAGCTTGTTGGTGg
 6          59895 TcttcaG-caGcTGGAAAGaaTT-----------------tcGGTcaggGATGaGCTCgCGGCCTATCAGCTTGTTGGTGA
 7          57728 TcGaG-GaTaGaT-GAAAGgtggcctctacatgtaagctatcacTgaagGAgGGGaTtgCGtCtgATtAGCTaGTTGGaGg
 8          10734 TGGGG-t-TgttgGGAAAGgtTTtTt--------------cTGGaTTggGATGGGCTCgCGGCtTATCAGCTTGTTGGTGg
 9          71041 gaGa----------GAAAGgGggcTtttagctc-------tcGcTaaTAGATGaGCctAaGtCggATtAGCTaGTTGGTGg
10          6161O gGGG----------GAAAGatTTA----------------tcGccaTTgGAgcGGCcCgCGtCtgATtAGCTaGTTGGTGg
11      CONSENSUS xGGx------x-x-GAAAGxxxxxxx--------------xcGcTxxxgGATGGGCcxgCGtCxgATtAGCTaGTTGGTGg

The above alignment rendered as colored html (thanks @timholl):

% av testfiles/aln.fasta --number-sequences --consensus --range 200,280 --compare-to 59735 -q --html aln.html

image

Write a single-page pdf file:

% av testfiles/aln.fasta --pdf test.pdf --quiet --blocks-per-page=5

And do you know about seqmagick? If not, run, don't walk to https://github.com/fhcrc/seqmagick and check it out, so that you can do this:

% seqmagick convert testfiles/ae_like.sto --output-format=fasta - | av -cx
       # 000000000000000000000000000000000
       # 445555555555566666666666666667777
       # 990111111155813445566778888991122
       # 791123678914209568907050235891215
  GA05AQR01D2ULR ...............TTGGT.GT..AG...A..
  GA05AQR01DFGSE ........................T.TAAGT..
  GA05AQR01CI0QB ...........A.....................
  GA05AQR01DW22X .TC..G.T.T.......................
  GA05AQR01A5WF4 ....................A........-T..
  GA05AQR01BUV2U ---..............................
  GA05AQR01B1R8I .............T...............CT..
  GA05AQR02JASPX ........A........................
  GCX02B001AYSTJ .............................-TA.
  GCX02B001DP9EQ ............A..........CA.......T
  GCX02B001AFAY1 ..............G..................
  GCX02B002J489C ...-......A......................
  GLKT0ZE01EDLCP AT...ATT.T.......................
  GLKT0ZE02I8LRD ---GA............................
-ref-> CONSENSUS TCTAGCGCGCGGGGACGAACGAGGCGCGCTGGA

About

Reformat and condense multiple sequence alignments to highlight variability

Resources

Stars

Watchers

Forks

Packages

No packages published