Repository for Extensively parameterized mutation–selection models reliably capture site-specific selective constraint, by SJS* and COW.
Please cite the following article, as needed: http://mbe.oxfordjournals.org/content/early/2016/08/10/molbev.msw171.abstract
-
data/
contains all simulated alignments, and the 512-taxon balanced trees (with branch lengths of either 0.5 or 0.01) used during simulation. Alignments are named with this format:<dataset_name>_bl<0.5/0.01>.phy
, wherebl
indicates the branch lengths of the tree used for simulation. -
simulation/
contains all code used for simulating sequences, as well as simulating parameters for use in sequence simulation.ramsey2011_alignments
contains all sequence alignments from Ramsey et al. (2011).derive_natural_simulation_parameters.py
derives parameters for simulating natural sequencesderive_dms_simulation_parameters.py
derives parameters for simulating DMS sequences. Note that experimental preferences are in the directorytrue_simulation_parameters
true_simulation_parameters
contains all true parameters for simulation, including true dN/dS and entropy, amino-acid fitness, codon frequencies, and selection coefficientssimulate_alignments.py
simulates a sequence alignment, specifically on UT's (now defunct..) PhyloCluster.
-
inference/
contains all code used for mutation-selection model inference. All scripts named*.sh
and*.qsub
are used for submitting jobs to UT's Phylocluster, and all*.py
scripts conduct and process inferences. -
results/
contains all inference results.swmutsel/
contains all inference results with swMutSel for a variety of penalizations, indicated in file name. The script ./results/extract_sw_fitness.py extracts fitness values from the MLE inferences from swMutSel into separate text files for later usephylobayes/
contains all inference results with pbMutSel
-
postprocessing/
contains all code used to process, analyze, and plot data (mostlyR
scripts). All generated plots are also in this directory. All scripts should be executed from this directory! Note: R code requires the packages (and their dependencies) cowplot, ggrepel, dplyr, tidyr, readr, grid, lme4, multcomp, and lmerTest.calculate_inferred_quantities.py
calculates dN/dS, entropy, selection coefficient distributions, and JSD for inferences inresults/
. Resulting quantities are in the subdirectory dataframes.process_results.R
processes inference results in dataframes to create the final csv file inference_results.csvplot_figures.R
makes all the figures in the manuscript. Figures are saved in either in the subdirectorymaintext_figures/
orSI_figures/
-
universal_functions.py
is a python module containing various functions used throughout the repository.
*This repository is maintained by SJS. Please file any questions/comments in Issues, or contact me at stephanie.spielman@gmail.com.