This package is designed to analyse the data from multi-contact pore-C reads. It is similar to the pairtools package in scope, however it is specifically designed to handle multi-contact reads (aka c-walks). It carries out the following operations:
- pre-processing a reference genome to generate auxiliary files used in downstream analyses
- creating virtual digests of the reference genome
- processing read-sorted BAM files to filter spurious alignments, detect ligation junctions and assign fragments
- converting the resulting contacts to a pairs format and a COO-formatted matrix compatible with Cooler for downstream processing.
There is an associated Pore-C-Snakemake that wraps the pore-C
commands and also handles some of the analysis steps outside the scope of pore-C tools
such as read alignment and conversion of output files to .cool
format. This is the recommended way to run pore-C tools
.
There are some sample datasets (fastq, alignment parquets, .pairs, .cool files) available for HindIII-digested HG002 (31Gb) here and for NlaIII-digested GM12878 (23Gb) here.
conda env create
source activate poreC
pip install -e .
pore_c --help
pytest tests
The biorxiv pre-print describing Pore-C can be found here:
Nanopore sequencing of DNA concatemers reveals higher-order features of chromatin structure