Source code for "Quantifying Influences on Intragenomic Mutation Rate"
File: sample_ensembl.py
Type: Python 3.5 script
Purpose: To analyse Ensembl variation database and create files of variant details and intron location details to allow calculation of contextual influence.
Requirements: Installation of PyCogent/ensembldb3, sqlalchemy and Ensembl variation database.
File: count_variants_strand_aware.py
Type: Python 3.5 script
Purpose: Processes a list of variant details produced by sample_ensembl.py and generate a Counter object of variant counts by context.
Requirements: Output files from sample_ensembl.py.
File: count_intronic_sites.py
Type: Python 3.5 script
Purpose: Scans the Ensembl database and creates a string which consists of the chained sequences of canonical introns.
Requirements: Installation of PyCogent/ensembldb3 and the Ensembl variation database.
File: count_contexts_with_re.py
Type: Python 3.5 script
Purpose: Counts all kmers in the intronic sequence generated by count_intronic_sites.py and stores the results in a Counter object.
Requirements: Output files from count_intronic_sites.py.
File: merge_chromosome_data.py
Type: Python 3.5 script
Purpose: Merges the files for individual chromosomes produced by count_variants.py or count_contexts_with_re.py (or their intergenic equivalents) for a given set of chromosomes.
Requirements: Either the output files from count_variants_strand_aware.py or the output files from count_contexts_with_re.py.
File: bayes_analysis.py
Type: Python 3.5 script
Purpose: samples posterior distributions of variance due to context for various mutation types using Bayesian binomial model with a beta prior.
Requirements: Output files from merge_chromosome_data.py for both variants and contexts.
File: aggregate_mutation_analysis.py
Type: Python 3.5 script
Purpose: Analyses variance due to context aggregated over all mutation directions for 1-mers, 3-mers, 5-mers and 7-mers.
Requirements: Variant and context counts produced by bayes_analysis.py.
File: sample_ensembl_intergenic_sites.py
Type: Python 3.5 script
Purpose: Analogous to count_intronic_sites.py, but records intergenic sites.
Requirements: Installation of PyCogent/ensembldb3 and Ensembl variation database.
File: sample_ensembl\intergenic_variants.py
Type: Python 3.5 script
Purpose: Analogous to sample_ensembl.py, but samples intergenic variants.
Requirements: Installation of PyCogent/ensembldb3, sqlalchemy and Ensembl variation database.
File: sample_ensembl_for_recombination.py
Type: Python 3.5 script
Purpose: Samples Ensembl and adds variant counts to the 10 kb bins used in deCODE recombination maps.
Requirements: Installation of PyCogent/ensembldb3, sqlalchemy, Ensembl variation database and deCODE recombination maps.
File: merge_male_and_female_recombination_rates.py
Type: Python 3.5 script
Purpose: Takes a table of variant data counted against 10kb bins by sample_ensembl_for_recombination.py and adds columns for male and female recombination rates.
Requirements: Output files from sample_ensembl_for_recombination.py, male and female deCODE recombination maps.
File: ARMA_select_models.py
Type: Python 3.5 script
Purpose: Selects ARMA models to best fit the residuals for linear regression of SNV density on recombination rate for selected chromosomes.
Requirements: Output files from merge_male_and_female_recombination_rates.py
File: ARMA_pq_analysis_all_by_chrom.py
Type: Python 3.5 script
Purpose: Uses MCMC so solve a linear regression of SNP rates against recombination rates using ARMA(p, q) residuals, aggregating variants across mutation direction.
Requirements: Output files from merge_male_and_female_recombination_rates.py and ARMA_select_models.py.
File: ARMA_pq_analysis.py
Type: Python 3.5 script
Purpose: Uses MCMC so solve a linear regression of SNP rates against recombination rates using ARMA(p, q) residuals, analysing each mutation direction for a single chromosome.
Selects optimal models.
Requirements: Output files from merge_male_and_female_recombination_rates.py.
File: Analyse residuals.ipynb
Type: Jupyter notebook
Purpose: Plots and analyse the residuals that result from ordinary least squares linear regression of SNP rates against recombination rates.
Requirements: statsmodels, output files from merge_male_and_female_recombination_rates.py.
File: Estimate mutation rates from Jonsson data.ipynb
Type: Jupyter notebook
Purpose: Estimate mutation rates by chromosome from Jónsson et al., 2017 Parental influence on human germline de novo mutations in 1,548 trios from iceland.
Requirements: Data from EMBL-EBI PRJEB21300.
File: Mutations and recombination using OLSLR.ipynb
Type: Jupyter notebook
Purpose: Linear regression of SNV densities against recombination rates using ordinary least squares linear regression (OLSLR). For comparison purposes only.
Requirements: Output files from sample_ensembl_for_recombination.py.
File: coordmapper.py
Type: Python 3.5 script
Purpose: Function to remap genetic coordinates using a python implementation of UCSC LiftOver.
File: probpoly_bayes.py
Type: Python 3.5 script
Purpose: Functions used by context/bayes_analysis.py
File: recombination.py
Type: Python 3.5 script
Purpose: Common functions used by recombination scripts.
File: generate_tables.ipynb
Type: Jupyter notebook
Purpose: Generate latex code for manuscript tables.
Requirements: cogent3, various data files.
File: Plot figures for recombination and mutation.ipynb
Type: Jupyter notebook
Purpose: Plot figures for manuscript relating to recombination.
Requirements: Matplotlib, seaborn, various data files.
File: plot_variance_by_mutation.ipynb
Type: Jupyter notebook
Purpose: Plot figures for manuscript relating to the effect of context.
Requirements: Matplotlib, seaborn, various data files.