CIPI: Context-based Inference of Point-mutation Influence

Detecting context dependent mutation rates from NGS data.
We designed a statistical method that allows for detection and evaluation of the effects of different motifs on mutation rates.

Reference:

"CIPI: A Bayesian framework for inferring the influence of sequence context on point mutations" ,Guy Ling, Danielle Miller, Rasmus Nielsen, Adi Stern

Getting started:

A jupyter notebook for running the code in available here with two optional modes:

simulate
real data inference

Input frequency file can be either a csv file or a tab delimited file (.freq):
Note that the simulations result have an additional line :#moran model results

#moran model results
Pos    Base    Freq    Ref    Read_count    Rank    Prob
1    A    0.9826    A    10000    3    1.0
1    C    0.0012    A    10000    0    1.0
1    G    0.0058    A    10000    1    1.0
1    T    0.0104    A    10000    2    1.0
2    A    0.9852    A    10000    3    1.0
2    C    0.0053    A    10000    1    1.0
2    G    0.0030    A    10000    0    1.0
2    T    0.0065    A    10000    2    1.0

Generating frequency files from raw NGS data (fastq files) can be done by our AccuNGS pipeline
"AccuNGS: detecting ultra-rare variants in viruses from clinical samples." (Gelbart, M., et al. 2018).
code available here.

Outputs

The script creates a folder named resultSummary with two types of files:

MutationFilename_allChainsSummary.csv - all MCMC chains
MutationFilename_bestChainsSummary.csv - the final file for analysis

Each file should include all possible motifs and their final beta and gamma:

betaMean	gammaMean	betaSD	gammaSD	topBeta	bottomBeta	topGamma	bottomGamma	name
-0.035037904	0	0.370295511	0	0.610536338	-0.644909518	0	0	P4__A
0.111123401	0	0.253151736	0	0.376907698	-0.847964432	0	0	P4__C
1.882467488	1	3.86E-11	0	1.882467488	1.882467488	1	1	P4__G
0.048830458	0	0.32543011	0	0.713381789	-0.511792261	0	0	P4__T
0.159739611	0	0.191817238	0	0.505363867	-0.320087558	0	0	P1_P4__AG
-0.11833743	0	0.259224332	0	0.455072615	-0.474370422	0	0	P1_P4__AT
-0.076704198	0	0.190523941	0	0.304878287	-0.463398403	0	0	P1_P4__CG

Here we can see that the motif P4__G is significanct as gammaMean = 1.

Also, a gamma and beta posterior plots will be saved. For only presenting figures go to GammaPriorTest.py and under outputAnalysis set plot=True .

Running times might differ as a factor of genome length (up to an hour for 7k bases).
For multiple simulations we recommand running a distributed script and not using the attached jupyter notebook.

Requirements

Python 2.7 with the following packages installed:

numpy
pandas
Bio
pymc3
theano
matplotlib
seaborn
tqdm

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.ipynb_checkpoints		.ipynb_checkpoints
example		example
GammaPriorTest.py		GammaPriorTest.py
LICENSE		LICENSE
MoranModel.py		MoranModel.py
README.md		README.md
RunContextAnalysis.ipynb		RunContextAnalysis.ipynb
SimulationGenerator.py		SimulationGenerator.py
evoModel.py		evoModel.py
featureExtraction.py		featureExtraction.py
featurizeMutationFile.py		featurizeMutationFile.py
gammaPrior.py		gammaPrior.py
mcmc.py		mcmc.py
reverseJumpSampler.py		reverseJumpSampler.py
simulations.py		simulations.py
theanoSSA.py		theanoSSA.py

License

Stern-Lab/CIPI

Folders and files

Latest commit

History

Repository files navigation

CIPI: Context-based Inference of Point-mutation Influence

Reference:

Getting started:

Outputs

Requirements

About

Resources

License

Stars

Watchers

Forks

Languages