PAVFinder is a Python package that detects structural variants from de novo assemblies (e.g. ABySS, Trans-ABySS). As such, it is able to analyse both genome and transcriptome assemblies:
- translocations
- inversions
- duplications
- insertions
- deletions
- simple-repeat expansions/contractions
- gene fusions
- internal tandem duplications (ITD)
- partial tandem duplications (PTD)
- small indels
- simple-repeat expansions/contractions
- skipped exons
- novel exons
- novel introns
- retained introns
- novel splice acceptors/donors
PAVFinder infers variants from non-contiguous (split or gapped) contig sequence alignments to the reference genome. Assemblies can be aligned to the reference genome (c2g
alignment) using bwa mem(genome) or gmap(transcriptome). Read support for events can be gathered by aligning reads to the assembly using bwa mem (r2c
alignment).
A pipeline that bundles the 3 analysis steps called TAP
(Transabyss-Alignment-PAVFinder) is provided to facilitate whole transcriptome analysis. TAP is also designed to be run in a targeted mode on selected genes. This requires a Bloom Filter of targeted gene sequences to be created beforehand. Whereas the full assembly of a single RNAseq library with over 100 million read pairs requires more than 24 hours to complete, a targeted assembly and analysis of a gene list (e.g. COSMIC) of several hundred can be completed within half an hour.
A new pipeline named fusion-bloom
coupling PAVFinder with our latest RNA-seq assembler RNA-Bloom has been added to the repository. We demonstrated that it has higher senstivitiy and specificity than most state-of-the-art fusion callers.
TAP2
, the next version of TAP
using RNA-Bloom instead of Trans-ABySS for better transcriptome assembly, has been released.
Readman Chiu, Ka Ming Nip, Justin Chu and Inanc Birol. TAP: a targeted clinical genomics pipeline for detecting transcript variants using RNA-seq data. BMC Med Genomics (2018) 11:79 https://doi.org/10.1186/s12920-018-0402-6
Readman Chiu, Ka Ming Nip, Inanc Birol. Fusion-Bloom: fusion detection in assembled transcriptomes. Bioinformatics (2019) btz902 https://doi.org/10.1093/bioinformatics/btz902