Please use this identifier to cite or link to this item:
Title: Hierarchical visualization of high dimensional data : interactive exploration of 'omics type data
Authors: Macquisten, Alexander Michael
Issue Date: 2022
Publisher: Newcastle University
Abstract: Our ability to investigate biological entities has been improving over the years, thanks to next-generation sequencing technologies providing an ever-increasing efficiency of data collection. However, this superior data collection hasn’t necessarily led to superior knowledge generation. Across the different fields of biological study, data is often high dimensional, with a single entity of interest correlating to a single dimension in the dataset. Datasets with more than a thousand dimensions are not uncommon, and visualising this without sacrificing some of the data is challenging. This thesis covers the creation of methods to support the visual exploration and analysis of high dimensional biological data, supporting users in applying their domain knowledge to discover patterns of interest. The first contribution of this research is a study comparing five different hierarchical visualization methods for how well they represent the underlying dataset at different scales of hierarchical structure. This was used to inform decisions in developing a method consisting of two linked views to support the exploration and selection of subsets of data, a Sunburst Chart provides an overview, displaying the full dataset, this is complemented by a Treemap that displays the subset of the hierarchy, consisting of the currently selected node and its local structure of child nodes. The second contribution was a pair of studies covering the evaluation of different statistical distributions for representing multivariate data features in aggregated data. The first was an initial usability study comparing two variations of a novel combination box plot-density distribution glyph called a density box plot, while the second study compared the glyph to box plots and density distribution plots independently. From this, the density box plot was added within each treemap node to show the aggregated distribution of child nodes. The third contribution of the thesis is a software tool to demonstrate the methods discussed, implemented with JavaScript/D3, with a NodeJS server to handle data processing. Alongside the linked Sunburst-Treemap view and density box plot glyphs, nodes within both views are coloured based on the separation of their sample groups, using Silhouette clustering, with lighter nodes showing more similar sample group values, while darker nodes show more variance between sample groups. Additional support views provide more information on the currently selected node, an analysis view offers expanded information on the currently selected node, while a Parallel coordinates plot shows the counts for each sample for each entity. Additional overview of the full data is provided through multiple small sunbursts, allowing different sample combinations to be viewed side by side before delving into deeper exploration. A final study was conducted to demonstrate the methods and the tool to end-users, using a series of heuristics the for them to evaluate the components of the tool. Overall, user response to the tool was positive, and the feedback they provided guided the future development of the tool.
Description: PhD Thesis
Appears in Collections:School of Computing

Files in This Item:
File Description SizeFormat 
Macquisten A M 2022.pdf7.81 MBAdobe PDFView/Open
dspacelicence.pdf43.82 kBAdobe PDFView/Open

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.