Skip to content

pauruihu/baga

 
 

Repository files navigation

Bacterial and Archaeal Genome Analyser

Novel analyses and wrapped tools pipelined for convenient processing of genome sequences

David Williams

Introduction

The Bacterial and Archaeal Genome Analyser (BAGA, pronounced "baga") is a commandline application and Python 2 package (3 coming soon) for diverse analyses of genome sequence data designed to facilitate reproducible research.

Input data can be complete genome sequences and/or paired end short reads from Illumina sequencers, typically whole genome shotgun libraries. Tasks might include variant calling and resolving population structure of resequenced pathogen isolates, analysis of evolution experiments and comparative genomics including phylogenomics. Click here to go straight to the documentation.

BAGA is a wrapper for proven third party tools1, but also includes novel algorithms for identifying chromosomal rearrangements and sequence repeats known to increase the likelihood of false positive variant calls2, the means to filter those probable false positive variants out of a dataset, the means to create custom pipelines for reproducible analyses3, and can generate various informative plots4. It is under active development: new features and much more documentation will be appearing shortly.

  1. e.g. BWA for short read alignment to longer sequences, GATK for variant calling and ClonalFrameML for homologous recombination inference
  2. Variant calls in such regions are unreliable and should be filtered because conventional variant calling algorithms would be unaware of potential misalignments caused by the loss of homology and might therefore report false positive variant calls e.g., near chromosomal rearrangements caused by mobile genetic elements. Detailed characterisation of those regions can be made by local de novo assemblies of reads and alignment of resulting contigs to the reference sequence
  3. researchers can make use of version-control and digital object identifiers to generate citable and reproducible analyses for peer review publication
  4. BAGA can plot all automatically indicated regions such as those prone to misalignment of short reads because of structural differences between a reference sequence and a sampled genome, e.g. a missing prophage (see point 2 above)

Please see the documentation for more details and step by step guides for performing various analysis pipelines and making your research more easily reproducible.

Funding

Work on this software was started at The University of Liverpool, UK with funding from The Wellcome Trust (093306/Z/10) awarded to:

  • Dr Steve Paterson (The University of Liverpool, UK)
  • Dr Craig Winstanley (The University of Liverpool, UK)
  • Dr Michael A Brockhurst (The University of York, UK)

License GPLv3+: GNU GPL version 3 or later. This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law.

About

Bacterial and Archaeal Genome Analyser

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 98.9%
  • TeX 1.1%