Skip to content

jsgounot/PgPy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PGPY

PgPy is a python library designed for population genomic analysis. PgPy is written using python and interacting with vcf files with pysam library, allowing to quickly iterate through whole genomic data. The current release is developped under python 3.6 and no support is provided for 2.x version.

Installing PgPy

Using pip

pip install git+https://github.com/jsgounot/PgPy.git

Or download / clone the github

git clone https://github.com/jsgounot/PgPy.git
cd PgPy
python setup.py install --user

Input dataset

The main purpose of this library is to work with a merged vcf file based on multiple samples sequencing data. The final vcf file must be tabulated using tabix. Merging multiple vcfs into on single vcf can be done using vcftools's vcfmerge function. Since pysam works well with compressed file, you should use bgzip from tabix as well at the end. If you want to work with snpEff results, do not forget to annotate your merged vcf files during the process.

Quick view

PgPy has been designed to be minimalist and flexible. You can look at the introduction guide to have a first view of the possibilities. PgPy provides also several recipies which might help you to see how it works. Simply, PgPy allows you to :

  • Iterate easily through variants along the genome or only a part of it (based on tabix support provided by pysam)
  • Produce quickly alignment with inferred SNPs and / or indels
  • Working within a python environment and interfacing easily with the BioPython library
  • Modify "on the fly" SNPs, such as modifying heterozygous SNPs into IUPAC code
  • Use multiprocessing to make process faster by parallelizing operations for each chromosome or regions

About

Population genomics using python

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages