hiplexpipe

A bioinformatics pipeline for variant calling for Hi-Plex sequencing.

Author: Khalid Mahmood (kmahmood@unimelb.edu.au)

hiplexpipe is based on the Ruffus library for writing bioinformatics pipelines. Its features include:

Job submission on a cluster using DRMAA (currently only tested with SLURM).
Job dependency calculation and checkpointing.
Pipeline can be displayed as a flowchart.
Re-running a pipeline will start from the most up-to-date stage. It will not redo previously completed tasks.

License

See LICENSE.txt in source repository.

Installation

External tools dependencies

hiplexpipe depends on the following programs and libraries:

python (version 2.7.5)
java (version 1.8)
DRMAA for submitting jobs to the cluster (it uses the Python wrapper to do this). You need to install your own libdrama.so for your local job submission system. There are versions available for common schedulers such as Torque/PBS, SLURM and so on.

SAMtools (version 1.3.1)
bwa for aligning reads to the reference genome (version 0.7.15)
GATK for calling variants and genotyping (version 3.6)
BEDTools for calculating sequencing coverage statistics (version 2.26.0)

hiplexpipe assumes the tools above are installed by the users themselves.

Python dependencies

hiplexpipe depends on the following python libraries, tools and wrappers.

Python 2.7.5
PyVCF
Biopython
pybedtools
cyvcf2

We recommend using a python virtual environment. Following is an examples of how to setup a hiplexpipe virtual environment ready for analysis:

Installation example on Melbourne Bioinformatics clusters

module load Python/2.7.10-vlsci_intel-2015.08.25
export DRMAA_LIBRARY_PATH=/usr/local/slurm_drmaa/1.0.7-GCC/lib/libdrmaa.so
virtualenv --system-site-packages venv
source venv/bin/activate
pip install -U https://github.com/khalidm/undr_rover/archive/master.zip
pip install -U https://github.com/khalidm/hiplexpipe/archive/master.zip
hiplexpipe --config pipeline.config --use_threads --log_file pipeline.log --jobs 10 --verbose 3 --just_print

Getting started

Step 1. Preparing the target region files

You should have two target interval files for every Hi-Plex experiment.

rover.txt - this contains the amplicon regions and primer sequences.
idt.txt - this file contains the primer sequences and their names matching the names in the above rover.txt file.

Follow instructions below to prepare the intervals files for the pipeline. (We are working on a tool to automate this task).

Main rover bed file. (rover.bed) This file is used to calculate alignment and coverage statistics. cut -f1,2,3,4,5 rover.txt > rover.bed or awk ' BEGIN{FS="\t";OFS="\t"}; { print $1,int($2+($3-$2)/2),int($3-($3-$2)/2),$4,$5} ' rover.txt > rover.bed
!Interval file. (rover.interval_list) - not required as the input bam is now clipped. java -jar picard.jar BedToIntervalList I=rover.bed SD=<hg19.dict> -O=rover.interval_list
Primer coordinates file. (primer.bedpe) This file is used to clip primer sequences from the alignments. awk ' BEGIN{FS="\t";OFS="\t"}; { print $1,$7,$8,$1,$12,$11} ' rover.txt > primer.bedpe

Name		Name	Last commit message	Last commit date
Latest commit History 295 Commits
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

src

src

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

setup.py

setup.py

Repository files navigation

hiplexpipe

A bioinformatics pipeline for variant calling for Hi-Plex sequencing.

License

Installation

External tools dependencies

Python dependencies

Installation example on Melbourne Bioinformatics clusters

Getting started

Step 1. Preparing the target region files

Step 2. Preparing the target region files

About

Releases

Packages

Languages

License

jasteen/hiplexpipe

Folders and files

Latest commit

History

Repository files navigation

hiplexpipe

A bioinformatics pipeline for variant calling for Hi-Plex sequencing.

License

Installation

External tools dependencies

Python dependencies

Installation example on Melbourne Bioinformatics clusters

Getting started

Step 1. Preparing the target region files

Step 2. Preparing the target region files

About

Resources

License

Stars

Watchers

Forks

Languages