Source code for analysing gene expression time-series data for mice experiments performed by Jingwen Lin/Jean Langhorne at Francis Crick Institute, London
- Authors: John Joseph Valletta and Mario Recker
- Data: Microarray Illumina Mouse WG6 v2 (45,281 probe sets representing 30,854 genes)
- Contacts: Jingwen Lin and Jean Langhorne, Francis Crick Institute, London
Contains all data sets used in the analysis.
acleaning.py
: Cleans upOriginal
data to produceProcessed
andLog2FC
aparse_immune_genes.R
: Parses MSigDB's .gmt file to list of genes
Data as received from Jingwen or downloaded from original source.
BL/SP-all.entities.txt
: Microarray data normalised to the median across all the chips and log2 transformed supplied by Jingwen LinBL/SP-2ANOVA.csv
: Two-way ANOVA across time-points from GeneSpring supplied by Jingwen LinMouseWG-6_V2_0_R3_11278593_A.txt
: Annotation file downloaded from Illumina here to map probe IDs to gene symbols. Note:Entrez_Gene_ID
-->EntrezID
(e.g 212772)Symbol
-->GeneSymbol
(e.g Thrsp)Probe_Id
-->IlluminaID
(e.g )Array_Address_Id
-->ProbeID
(e.g 2600193)
Original
data cleaned up by acleaning.py
-
Blood/Spleen.csv
: Reduced versions ofBL/SP-all.entities.txt
as follows: -
Blood/Spleen AS/CB.csv
: Same asBlood/Spleen.csv
but split by strain,ExcelSymbol
removed and columns renamed as Day.NReplicate i.e 3.2 -> day 3, replicate 2
Blood/Spleen AS/CB.csv
: Same asProcessed
data but with log 2 fold change computed (day 0 and day 12 naive mice pooled together)
Deconvolution/CIBERSORT "ready" files created by adecon_ready.R
srep40508-s1.csv
: The signature matrix downloaded from hereSigMatrix.txt
: Same assrep40508-s1.csv
but tab-delimited (as required by the CIBERSORT function)Blood/Spleen AS/CB.txt
: Same asProcessed
data but tab-delimited and anti-logged (i.e 2^x) (see CIBERSORT documentation) and retaining only the top ranked probe for probes mapping to the same gene symbol (as ranked by the Gaussian Process fit).
Perform cell type deconvolution using CIBERSORT and the signature matrix of Chen et al.
cibersort.R
: CIBERSORT source codecell_type.R
: script to run CIBERSORT on all data sets (Blood/Spleen AS/CB)plot_cell_type_results.R
: plot stacked bar plots
GPy-1.7.7
,paramz-0.7.4
,GPclust-0.1.0
: Core modules for Gaussian Process Modelling (Sheffield Machine Learning Group)
script.py
: Top-level script to cluster gene expression data (computationally expensive)analysis.py
: Top-level script to analyse the clustering results (compare time-profiles, gene enrichment analysis, etc.)reactome_ready.py
: A very short script to take each gene list (cluster) and save it as a .txt file so that it can be imported in Cytoscape/ReactomePlugin
All the modules used to analyse the data
config.py
: Configuration file containing all constants usedinput_output.py
: Input/output functions (creating folders, saving to pdf, pickle etc)gaussian_process.py
: Wrapper functions for Gaussian Process Modelsutil.py
: Various utility functionsenrichr
: Gene set enrichment analysis using Enrichrcompare
: Functions to compare time-profiles of genes or clusters across experimental conditions