Skip to content

kcha/pyconserve

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Python package that helps intersect BED files with sequence conservation tracks.

This package was inspired by the need to extract conservation scores for a given set of BED intervals. It is based on a solution described by Dave Tang in this blog post.

pyconserve takes as input the following:

  1. A BED file containing the intervals of interest
  2. Phastcons or PhyloP bedGraph files (one per chromosome)

It then uses [pybedtools] to perform genomic intersections between the input BED file and bedGraph files. As each chromosome is queried independently, this can be run using parallelization to speed up runtime.

Note: this software is in beta and may contain bugs.

Getting started

Installation

  1. Install bedtools

  2. Clone the pyconserve GitHub repo

     git clone https://www.github.com/kcha/pyconserve.git
     cd pyconserve
    
  3. Install the package

     python setup.py install
    

Preparing conservation files

The following steps describe how to download conservation tracks and convert them to bedGraph format using the UCSC tools wigToBigWig and bigWigToBedGraph. These steps are adapted from this blog post.

  1. Download Phastcons or PhyloP wiggle files (one chromosome per file) from UCSC

  2. Convert wiggle to bigWig using wigToBigWig.

     wigToBigWig chr1.phastCons100way.wigFix.gz hg19.chrom_sizes.txt \
         chr1.phastCons100way.bigWig
    

    This step requires a file containing the chromosome sizes of your species.

  3. Convert bigWig to bedGraph using bigWigToBedGraph

     bigWigToBedGraph chr1.phastCons100way.bigWig chr1.phastCons100way.bedGraph
    
  4. (optional) To save space, compress the bedGraph file and remove the wiggle and bigWig files

     rm -v chr1.phastCons100way.wigFix.gz chr1.phastCons100way.bigWig
     gzip -vf chr1.phastCons100way.bedGraph
    

Run pyconserve

Given a BED file and a set of bedGraph conservation files, pyconserve can be run as follows:

pyconserve a.bed chr*.phastCons100way.bedGraph.gz > a_conservation.bed

This will perform bedtools intersect and save the output to a_conservation.bed.

Aggregate conservation scores

To aggregate these results by computing the mean conservation score for each interval in the BED file, use the command summarize_conserve:

summarize_conserve a_conservation.bed > summarized.bed

Alternatively, the above two commands can be chained together as follows:

pyconserve a.bed chr*.phastCons100way.bedGraph.gz | \
    summarize_conserve - > summarized.bed

Acknowledgements

This software is inspired by previous work from others:

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages