Skip to content

helmutsimon/ProbPolymorphism

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ProbPolymorphism

Source code for "Quantifying Influences on Intragenomic Mutation Rate"

context folder

File: sample_ensembl.py
Type: Python 3.5 script
Purpose: To analyse Ensembl variation database and create files of variant details and intron location details to allow calculation of contextual influence.
Requirements: Installation of PyCogent/ensembldb3, sqlalchemy and Ensembl variation database.

File: count_variants_strand_aware.py
Type: Python 3.5 script
Purpose: Processes a list of variant details produced by sample_ensembl.py and generate a Counter object of variant counts by context.
Requirements: Output files from sample_ensembl.py.

File: count_intronic_sites.py
Type: Python 3.5 script
Purpose: Scans the Ensembl database and creates a string which consists of the chained sequences of canonical introns. Requirements: Installation of PyCogent/ensembldb3 and the Ensembl variation database.

File: count_contexts_with_re.py
Type: Python 3.5 script
Purpose: Counts all kmers in the intronic sequence generated by count_intronic_sites.py and stores the results in a Counter object.
Requirements: Output files from count_intronic_sites.py.

File: merge_chromosome_data.py
Type: Python 3.5 script
Purpose: Merges the files for individual chromosomes produced by count_variants.py or count_contexts_with_re.py (or their intergenic equivalents) for a given set of chromosomes.
Requirements: Either the output files from count_variants_strand_aware.py or the output files from count_contexts_with_re.py.

File: bayes_analysis.py
Type: Python 3.5 script
Purpose: samples posterior distributions of variance due to context for various mutation types using Bayesian binomial model with a beta prior.
Requirements: Output files from merge_chromosome_data.py for both variants and contexts.

File: aggregate_mutation_analysis.py
Type: Python 3.5 script
Purpose: Analyses variance due to context aggregated over all mutation directions for 1-mers, 3-mers, 5-mers and 7-mers.
Requirements: Variant and context counts produced by bayes_analysis.py.

File: sample_ensembl_intergenic_sites.py
Type: Python 3.5 script
Purpose: Analogous to count_intronic_sites.py, but records intergenic sites.
Requirements: Installation of PyCogent/ensembldb3 and Ensembl variation database.

File: sample_ensembl\intergenic_variants.py
Type: Python 3.5 script
Purpose: Analogous to sample_ensembl.py, but samples intergenic variants.
Requirements: Installation of PyCogent/ensembldb3, sqlalchemy and Ensembl variation database.

recombination folder

File: sample_ensembl_for_recombination.py
Type: Python 3.5 script
Purpose: Samples Ensembl and adds variant counts to the 10 kb bins used in deCODE recombination maps.
Requirements: Installation of PyCogent/ensembldb3, sqlalchemy, Ensembl variation database and deCODE recombination maps.

File: merge_male_and_female_recombination_rates.py
Type: Python 3.5 script
Purpose: Takes a table of variant data counted against 10kb bins by sample_ensembl_for_recombination.py and adds columns for male and female recombination rates.
Requirements: Output files from sample_ensembl_for_recombination.py, male and female deCODE recombination maps.

File: ARMA_select_models.py
Type: Python 3.5 script
Purpose: Selects ARMA models to best fit the residuals for linear regression of SNV density on recombination rate for selected chromosomes.
Requirements: Output files from merge_male_and_female_recombination_rates.py

File: ARMA_pq_analysis_all_by_chrom.py
Type: Python 3.5 script
Purpose: Uses MCMC so solve a linear regression of SNP rates against recombination rates using ARMA(p, q) residuals, aggregating variants across mutation direction.
Requirements: Output files from merge_male_and_female_recombination_rates.py and ARMA_select_models.py.

File: ARMA_pq_analysis.py
Type: Python 3.5 script
Purpose: Uses MCMC so solve a linear regression of SNP rates against recombination rates using ARMA(p, q) residuals, analysing each mutation direction for a single chromosome.
Selects optimal models.
Requirements: Output files from merge_male_and_female_recombination_rates.py.

File: Analyse residuals.ipynb
Type: Jupyter notebook
Purpose: Plots and analyse the residuals that result from ordinary least squares linear regression of SNP rates against recombination rates.
Requirements: statsmodels, output files from merge_male_and_female_recombination_rates.py.

File: Estimate mutation rates from Jonsson data.ipynb
Type: Jupyter notebook
Purpose: Estimate mutation rates by chromosome from Jónsson et al., 2017 Parental influence on human germline de novo mutations in 1,548 trios from iceland.
Requirements: Data from EMBL-EBI PRJEB21300.

File: Mutations and recombination using OLSLR.ipynb
Type: Jupyter notebook
Purpose: Linear regression of SNV densities against recombination rates using ordinary least squares linear regression (OLSLR). For comparison purposes only.
Requirements: Output files from sample_ensembl_for_recombination.py.

shared folder

File: coordmapper.py
Type: Python 3.5 script
Purpose: Function to remap genetic coordinates using a python implementation of UCSC LiftOver.

File: probpoly_bayes.py
Type: Python 3.5 script
Purpose: Functions used by context/bayes_analysis.py

File: recombination.py
Type: Python 3.5 script
Purpose: Common functions used by recombination scripts.

tables_and_figures folder

File: generate_tables.ipynb
Type: Jupyter notebook
Purpose: Generate latex code for manuscript tables.
Requirements: cogent3, various data files.

File: Plot figures for recombination and mutation.ipynb
Type: Jupyter notebook
Purpose: Plot figures for manuscript relating to recombination.
Requirements: Matplotlib, seaborn, various data files.

File: plot_variance_by_mutation.ipynb
Type: Jupyter notebook
Purpose: Plot figures for manuscript relating to the effect of context.
Requirements: Matplotlib, seaborn, various data files.

About

Source code for "Quantifying Influences on Intragenomic Mutation Rate"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published