'Arama'

GUI-based visualisation software for interactively and quickly exploring high dimensional data with functionality to identify genomic signals and relate these to user specified phenotypes.

Background

Early exploratory analysis of high dimensional data typically involves a transformation of the data to a dimensionality reduced space for visualisation. This transformation and visual inspection highlights, in an intuitive way, global patterns in the data, such as if the samples are clustering in accordance with the hypotheses. Batch effects, sample mismatches and other technical artefacts are also highlighted by this visualisation. An unsupervised dimensionality reduction method, such as principal components analysis (PCA), produces views on the data unbiased by our hypotheses. PCA creates a low-dimensional representation of a data set which is optimal in the terms of containing as much of the variance in the original data set as is possible. These principal components are ordered by the patterns encoding the highest variance in the data set. Plotting principle components shows how samples cluster on each dimension, with clustering illustrative of 'likeness' on that dimension. This allows users to visually discover, in an unbiased manner, variables that are characteristic for specific sample groups. Often, this unbiased view reveals new insights into the data that were not expected. It would be particularly useful to further characterise these insights and determine why samples is clustering or segregating on given dimension(s) and if it is related to a phenotype or experimental technical factor. Further, if this data clustering is correlated with a phenotype of interest, what are the genes, transcripts, methylated CpG sites or so on that are driving this phenomenon? Capturing this information would lead to a far more powerful exploratory data analysis - one which generates new hypotheses and analytical questions for the next phase of the analysis.

Features

To concurrently explore all principal components (PC) across the number of samples (n), we present a scatterplot with the PC order on the y-axis. To explore one or two PCs in more detail, we present a standard 2D scatterplot. To highlight clusters, we allow user-specified phenotypes to be mapped to colour, shape or point size with selection from drop-down menus. Both graphs interact and clusters are able to be defined with a select tool.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
IOdata		IOdata
src		src
.gitignore		.gitignore
.gitignore~		.gitignore~
Arama_poster.pdf		Arama_poster.pdf
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IOdata

IOdata

src

src

.gitignore

.gitignore

.gitignore~

.gitignore~

Arama_poster.pdf

Arama_poster.pdf

LICENSE

LICENSE

README.md

README.md

Repository files navigation

'Arama'

Background

Features

About

Releases

Packages

Languages

License

JasonR055/arama

Folders and files

Latest commit

History

Repository files navigation

'Arama'

Background

Features

About

Resources

License

Stars

Watchers

Forks

Languages