Skip to content

nh13/adVNTR

 
 

Repository files navigation

adVNTR - A tool for genotyping VNTRs

adVNTR is a tool for genotyping Variable Number Tandem Repeats (VNTR) from sequence data. It works with both NGS short reads (Illumina HiSeq) and SMRT reads (PacBio) and finds diploid repeating counts for VNTRs and identifies possible mutations in the VNTR sequences.

Software Requirements

  1. Following libraries are required
    • python2.7
    • python-pip
    • python-tk
    • libz-dev
    • samtools

You can install these requirement in Ubuntu Linux by running sudo apt-get install python2.7 python-pip python-tk libz-dev samtools

  1. Following python2.7 packages are required:
    • biopython
    • pysam version 0.9.1.4 or above
    • cython
    • networkx version 1.11
    • scipy
    • joblib

You can install required python libraries by running pip install -r requirements.txt

  1. In addition, ncbi-blast version 2.2.29 or above is required

Data Requirements

* To run adVNTR on trained VNTR models:
  • Download vntr_data.zip and extract it inside the project directory.

Alternatively, you can add model for custom VNTR. See add-custom-vntr-label for more information.

Execution:

Use following command to see the help for running the tool.

python advntr.py --help

The program outputs the RU count genotypes for all VNTRs in vntr_data directory. To specify a single VNTR by its ID use --vntr_id <id> option.

Demo 1: input in BAM format

  • --alignment_file specifies the alignment file containing mapped and unmapped reads:
python advntr.py --alignment_file aligned_illumina_reads.bam --working_directory ./log_dir/
  • With --pacbio, adVNTR assumes the alignment file contains PacBio sequencing data:
python advntr.py --alignment_file aligned_pacbio_reads.bam --working_directory ./log_dir/ --pacbio
  • Use --frameshift to find the possible frameshifts in VNTR:
python advntr.py --alignment_file aligned_illumina_reads.bam --working_directory ./log_dir/ --frameshift

Demo 2: input in fasta format

  • Use the following command to genotype the RU count using fasta file:
python advntr.py --fasta unaligned_illumina_reads.fasta --working_directory ./log_dir/

Citation:

Bakhtiari, M., Shleizer-Burko, S., Gymrek, M., Bansal, V. and Bafna, V., 2017. Targeted Genotyping of Variable Number Tandem Repeats with adVNTR. bioRxiv, p.221754.

About

A tool for genotyping Variable Number Tandem Repeats (VNTR) from sequence data

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%