Please use this identifier to cite or link to this item:
Title: Scalable Bayesian Hierarchical Modelling with application in genomics
Authors: Kontaratou, Antonia
Issue Date: 2021
Publisher: Newcastle University
Abstract: Hierarchical modelling can be applied to data organised in groups, for which we are interested in describing the within and between group variability. This type of model is very useful for a broad range of statistical problems. However, due to the complex nature of some data and the continuously increasing volume of datasets, using current methodologies for Bayesian hierarchical modelling can be challenging. The algorithms currently utilised, such as the Markov Chain Monte Carlo (MCMC) family, can be computationally intensive and difficult to parallelise, often leading to extended processing times, limiting exploration of different models, especially in cases of \Big Data" applications. These algorithms can be deployed using various programming paradigms, such as object-oriented, probabilistic and functional. The latter has been gaining ground in academia and industry over recent years. This thesis is concerned with examining an approach that will harness the benefits of functional programming and aims to provide valuable insights on whether MCMC algorithms, and in particular the Gibbs sampler, implemented in a functional style, can scale better whilst remaining accurate. More specifically, we implement a Gibbs sampler in Scala to t a Bayesian hierarchical two-way Anova model that includes interactions and accounts for various levels of asymmetry in the e ects. We incorporate variable selection on the interaction e ects through exploration of two techniques, an indicator variable approach, and the Horseshoe prior. In addition, we investigate under which model speci cations parallelism can a ect speed-up. After comparing the e ciency of the methods developed to the results deriving from some already existing libraries that automate and facilitate the modelling and inference processes, we explore their application on a yeast genome case study. The identi cation of gene complexes that genetically interact with telomere capping defects is of great importance in cell biology, as research has shown that telomeres can be related to ageing and various diseases. A Bayesian hierarchical model is developed to highlight and estimate the strength of potential epistatic relationships between genes of interest. However, the methodology developed has a wider range of applications and is not limited to the yeast genome case study.
Description: Ph. D. Thesis.
Appears in Collections:School of Mathematics and Statistics

Files in This Item:
File Description SizeFormat 
Kontaratou Antonia Final Sumission e-copy.pdfThesis26.8 MBAdobe PDFView/Open
dspacelicence.pdfLicence43.82 kBAdobe PDFView/Open

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.