Skip to content

JasonR055/arama

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

'Arama'

GUI-based visualisation software for interactively and quickly exploring high dimensional data with functionality to identify genomic signals and relate these to user specified phenotypes.

Background

Early exploratory analysis of high dimensional data typically involves a transformation of the data to a dimensionality reduced space for visualisation. This transformation and visual inspection highlights, in an intuitive way, global patterns in the data, such as if the samples are clustering in accordance with the hypotheses. Batch effects, sample mismatches and other technical artefacts are also highlighted by this visualisation. An unsupervised dimensionality reduction method, such as principal components analysis (PCA), produces views on the data unbiased by our hypotheses. PCA creates a low-dimensional representation of a data set which is optimal in the terms of containing as much of the variance in the original data set as is possible. These principal components are ordered by the patterns encoding the highest variance in the data set. Plotting principle components shows how samples cluster on each dimension, with clustering illustrative of 'likeness' on that dimension. This allows users to visually discover, in an unbiased manner, variables that are characteristic for specific sample groups. Often, this unbiased view reveals new insights into the data that were not expected. It would be particularly useful to further characterise these insights and determine why samples is clustering or segregating on given dimension(s) and if it is related to a phenotype or experimental technical factor. Further, if this data clustering is correlated with a phenotype of interest, what are the genes, transcripts, methylated CpG sites or so on that are driving this phenomenon? Capturing this information would lead to a far more powerful exploratory data analysis - one which generates new hypotheses and analytical questions for the next phase of the analysis.

Features

To concurrently explore all principal components (PC) across the number of samples (n), we present a scatterplot with the PC order on the y-axis. To explore one or two PCs in more detail, we present a standard 2D scatterplot. To highlight clusters, we allow user-specified phenotypes to be mapped to colour, shape or point size with selection from drop-down menus. Both graphs interact and clusters are able to be defined with a select tool.

About

Arama - Interactive exploration of global signals in high dimensional data.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages