GitHub - eafyounian/pypette: Pythonic utilities for the analysis of high throughput sequencing data

Introduction

Pypette is a collection of command line utilities and libraries for analyzing biological data. Current functionality includes:

Gene expression quantification
Copy number analysis
Mutation and SNP analysis
Chromosomal rearrangement analysis
FASTA, VCF and SAM file manipulation
Other miscellaneous functionality

For performance reasons, it is recommended that the software is run using PyPy, a high performance implementation of the Python language. You can build PyPy from source, or download portable binaries.

Installation

To install Pypette, download the latest release and extract it to a folder. Then run the Makefile and add the subfolder bin/ to your PATH. See below for an example:

wget --content-disposition https://github.com/annalam/pypette/archive/0.7.1.tar.gz
tar -xzf pypette-0.7.1.tar.gz
cd pypette-0.7.1
make
export PATH=/some/folder/pypette-0.7.1/bin:$PATH

Some Pypette functionality requires external software to be installed:

samtools: required for mutation and structural variant calling
Bowtie: required for structural variant calling
ANNOVAR: required for mutation annotation

Examples

Chromosomal rearrangements

Download and prepare a reference genome for use in split read analysis:

wget ftp://ftp.ccb.jhu.edu/pub/data/bowtie_indexes/hg19.ebwt.zip
unzip hg19.ebwt.zip
fasta flatten <(bowtie-inspect hg19) hg19

Identify structural variants based on read pairs and split reads. Split reads must overlap at least 25bp on both sides of the breakpoint:

breakfast detect -a25 test.bam hg19 test
breakfast detect -a25 control.bam hg19 control

Construct a blacklist of false positive regions based on the control sample:

breakfast blacklist control.sv > blacklist.txt

Only keep structural variants with at least one read pair and three split reads of evidence, or at least 10 split reads (one -r option must be satisfied). Also discard blacklisted structural variants found in the control:

breakfast filter -r 1-3-0 -r 0-10-0 --blacklist=blacklist.txt test.sv > filtered.sv

Annotate structural variants with information about nearby genes:

wget ftp://ftp.ensembl.org/pub/release-74/gtf/homo_sapiens/Homo_sapiens.GRCh37.74.gtf.gz
gtf to gene bed Homo_sapiens.GRCh37.74.gtf.gz > ensembl_genes.bed
breakfast annotate -b ensembl_genes.bed filtered.sv > annotated.sv

Name		Name	Last commit message	Last commit date
Latest commit History 150 Commits
compiled		compiled
xlrd		xlrd
Makefile		Makefile
README.md		README.md
__init__.py		__init__.py
backup.py		backup.py
bed.py		bed.py
betastasis.py		betastasis.py
breakfast.py		breakfast.py
cghub.py		cghub.py
coverage.jl		coverage.jl
coverage.py		coverage.py
docopt.py		docopt.py
ensembl.py		ensembl.py
expression.py		expression.py
fasta.py		fasta.py
parallel.py		parallel.py
pypette.py		pypette.py
sam.py		sam.py
smallrna.py		smallrna.py
svgfig.py		svgfig.py
swiss.py		swiss.py
tcga.py		tcga.py
variant.py		variant.py

eafyounian/pypette

Folders and files

Latest commit

History

Repository files navigation

Introduction

Installation

Examples

Chromosomal rearrangements

About

Resources

Stars

Watchers

Forks

Languages