#Reference MHC Project Analysis Codebase
This is a set of scripts developed by Ngan Nguyen and Benedict Paten to generate Tables and Figures for the Reference MHC project.
##Dependencies
- jobTree
- matplotlib
- sonLib
- python version 2.6 or after, but before 3.0
##Installation
- Download the package
cd
into referenceViz/src/- Type
make
- Add the referenceViz/bin/ directory into your PATH
##Run
-
getPlot.py
: a wrapper to run various analyses, including: Contiguity Coverage N50 SNP rate Indel rate Indel length distribution CNV dbSNP and 1000 Genomes project SNPs and short indels (<= 10bp) validationInput include: Location of output directory after running referenceScript pipeline If dbSNP and 1000 Genomes project SNPs validation is included, the dbSNP file with the known SNPs is required. If dbSNP and 1000 Genomes project indels validation is included, the dbSNP file with the known indels is required. The SNP files must be in the tab-separated format of the fields specified here , excluding the "bin" field.
-
mapReadsToRef.py
: script to compare mapping of short reads to two different reference sequences (i.e the C. Ref. sequence and GRCh37 sequence) -
mapLargeIndels.py
: script to extract the large-indel sequences and map them to the reference.largeIndelTab.py
: generates the summary table of the large indel mapping. -
uniqMapStats.py
: script to compare reads that map uniquely to one reference but not uniquely to the other.uniqMapTab.py
: generates the summary table for uniquely-mapping comparisons.