Skip to content

jjvalletta/Jingwen

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Mice Microarray Data Analysis

Source code for analysing gene expression time-series data for mice experiments performed by Jingwen Lin/Jean Langhorne at Francis Crick Institute, London

  • Authors: John Joseph Valletta and Mario Recker
  • Data: Microarray Illumina Mouse WG6 v2 (45,281 probe sets representing 30,854 genes)
  • Contacts: Jingwen Lin and Jean Langhorne, Francis Crick Institute, London

Data

Contains all data sets used in the analysis.

  • acleaning.py: Cleans up Original data to produce Processed and Log2FC
  • aparse_immune_genes.R: Parses MSigDB's .gmt file to list of genes

Original

Data as received from Jingwen or downloaded from original source.

  • BL/SP-all.entities.txt: Microarray data normalised to the median across all the chips and log2 transformed supplied by Jingwen Lin
  • BL/SP-2ANOVA.csv: Two-way ANOVA across time-points from GeneSpring supplied by Jingwen Lin
  • MouseWG-6_V2_0_R3_11278593_A.txt: Annotation file downloaded from Illumina here to map probe IDs to gene symbols. Note:
    • Entrez_Gene_ID --> EntrezID (e.g 212772)
    • Symbol --> GeneSymbol (e.g Thrsp)
    • Probe_Id --> IlluminaID (e.g )
    • Array_Address_Id --> ProbeID (e.g 2600193)

Processed

Original data cleaned up by acleaning.py

  • Blood/Spleen.csv: Reduced versions of BL/SP-all.entities.txt as follows:

    1. Remove meta data columns apart from ProbeID, Symbol and data
    2. Fix issue with gene names see here and here
    3. Note: ExcelSymbol = original gene symbol (potentially incorrect), Symbol = corrected gene symbol
  • Blood/Spleen AS/CB.csv: Same as Blood/Spleen.csv but split by strain, ExcelSymbol removed and columns renamed as Day.NReplicate i.e 3.2 -> day 3, replicate 2

Log2FC

  • Blood/Spleen AS/CB.csv: Same as Processed data but with log 2 fold change computed (day 0 and day 12 naive mice pooled together)

DeconReady

Deconvolution/CIBERSORT "ready" files created by adecon_ready.R

  • srep40508-s1.csv: The signature matrix downloaded from here
  • SigMatrix.txt: Same as srep40508-s1.csv but tab-delimited (as required by the CIBERSORT function)
  • Blood/Spleen AS/CB.txt: Same as Processed data but tab-delimited and anti-logged (i.e 2^x) (see CIBERSORT documentation) and retaining only the top ranked probe for probes mapping to the same gene symbol (as ranked by the Gaussian Process fit).

CellType

Perform cell type deconvolution using CIBERSORT and the signature matrix of Chen et al.

  • cibersort.R: CIBERSORT source code
  • cell_type.R: script to run CIBERSORT on all data sets (Blood/Spleen AS/CB)
  • plot_cell_type_results.R: plot stacked bar plots

Analysis

Dependencies

Top-level scripts

  • script.py: Top-level script to cluster gene expression data (computationally expensive)
  • analysis.py: Top-level script to analyse the clustering results (compare time-profiles, gene enrichment analysis, etc.)
  • reactome_ready.py: A very short script to take each gene list (cluster) and save it as a .txt file so that it can be imported in Cytoscape/ReactomePlugin

Custom modules

All the modules used to analyse the data

  • config.py: Configuration file containing all constants used
  • input_output.py: Input/output functions (creating folders, saving to pdf, pickle etc)
  • gaussian_process.py: Wrapper functions for Gaussian Process Models
  • util.py: Various utility functions
  • enrichr: Gene set enrichment analysis using Enrichr
  • compare: Functions to compare time-profiles of genes or clusters across experimental conditions

About

Mice microarray gene expression time-series analysis

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published