Collection of code for a paired somatic variant calling pipeline.
- bwa-mem alignment to refernce fasta
[https://github.com/samtools/samtools]
- samtools addreplacerg > fixmate > sort
- samtools markdup
[https://broadinstitute.github.io/picard/command-line-overview.html]
- picardtools AlignmentMetrics and CollectHsMetrics
- samtools markdup output
[https://github.com/nygenome/lancet]
Lancet uses a localized micro-assembly strategy to detect somatic mutation with high sensitivity and accuracy on a tumor/normal pair. Lancet is based on the colored de Bruijn graph assembly paradigm where tumor and normal reads are jointly analyzed within the same graph.
[https://github.com/Illumina/manta]
Manta calls structural variants (SVs) and indels from mapped paired-end sequencing reads. Manta discovers, assembles and scores large-scale SVs, medium-sized indels and large insertions within a single efficient workflow.
[https://gatk.broadinstitute.org/hc/en-us/articles/360037593851-Mutect2]
[https://github.com/broadinstitute/gatk/releases]
Call somatic short mutations via local assembly of haplotypes. Short mutations include SNVs and INDELs.
[https://github.com/Illumina/strelka]
Strelka2 is a fast and accurate small variant caller optimized for analysis of germline variation in small cohorts and somatic variation in tumor/normal sample pairs. The somatic calling model improves on the original Strelka method for liquid and late-stage tumor analysis by accounting for possible tumor cell contamination in the normal sample. A final empirical variant re-scoring step using random forest models trained on various call quality features has been added to both callers to further improve precision.
[https://github.com/dkoboldt/varscan]
VarScan employs a robust heuristic/statistic approach to call variants that meet desired thresholds for read depth, base quality, variant allele frequency, and statistical significance.
[https://github.com/AstraZeneca-NGS/VarDictJava]
VarDict is an ultra sensitive variant caller for both single and paired sample variant calling from BAM files. VarDict implements several novel features such as amplicon bias aware variant calling from targeted sequencing experiments, rescue of long indels by realigning bwa soft clipped reads and better scalability than many Java based variant callers.
[https://cran.r-project.org/web/packages/sequenza/index.html]
Tools to analyze genomic sequencing data from paired normal-tumor samples, including cellularity and ploidy estimation; mutation and copy number (allele-specific and total copy number) detection, quantification and visualization.
[https://github.com/mskcc/facets]
Algorithm to implement Fraction and Allele specific Copy number Estimate from Tumor/normal Sequencing.
[https://github.com/samtools/bcftools]
- bcftools concat > sort
- Combine somatic snp/indel vcf data for each variant caller; position sort
- bcftools isec
- Produces an intersection matrix of the variants. This can be used to evaluate caller performance/metrics
[https://varlociraptor.github.io/landing/]
- Calls SNVs, MNVs, indels, arbitrary replacements, inversions, duplications, haplotype blocks (combinations of any of the previous), and breakends.
- Supports all length ranges (from small to structural) with a unified statistical model.
- The statistical model entails all possible sources of uncertainty (mapping, typing, heterogeneity) and biases (strand, read pair orientation, read position, sampling, contamination, homologous regions).
- Resulting variant calls can be filtered by false discovery rate. No parameter tuning necessary.
- Maximum a posteriori allele frequency estimates are provided with each call.
- germline
- 'normal:0.5 | normal:1.0'
- somatic_tumor
- 'normal:]0.0,0.5['
- somatic_normal
- 'normal:0.0 & tumor:]0.0,1.0] & !$ffpe'
- (optional) ffpe_artifact
- '(C>T | G>A) & ((tumor:0.0 & normal:]0.0,0.1]) | (tumor:]0.0,0.1] & normal:0.0))'
- HGVS
- ENSEMBL
- dbSNP
- SpliceAI
- GNomAD
- REVEL
- ClinVAR
LoFTee
[https://ftp.ncbi.nlm.nih.gov/snp/latest_release/VCF/]
Filtered for common snps.
bcftools filter -i'TYPE="snp" & COMMON=1'