Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp0144558h06x
Title: Latent variable models for non-Gaussian data with applications to genome-wide variation
Authors: Hao, Wei
Advisors: Storey, John D
Contributors: Quantitative Computational Biology Department
Keywords: statistical genetics
Subjects: Statistics
Genetics
Issue Date: 2018
Publisher: Princeton, NJ : Princeton University
Abstract: Low-rank latent variable models have been applied in many fields, as the usefulness of being able to capture systematic variation and reduce the dimensionality of data cannot be understated. Principal Components Analysis is an exemplar of this idea and is considered a staple of data analysis. This thesis discusses low-rank latent variable models, primarily in the context of modeling population structure in modern genomics. The goal of our approach is to construct a general framework that utilizes the Binomial nature of genotyping data. We present multiple ways to fit models within this framework that are appropriate for the differing requirements of practitioners. We also show an application of our framework to the problem of genome-wide association testing. Further, we work on the important practical problem of validation for models of population structure, from the perspective of the population genetics principle of Hardy-Weinberg Equilibrium. Our approach to this allows for a variant by variant analysis in which problematic data points can be filtered before subsequent analysis. Further, these variant level results can be aggregated to assess genome-wide goodness-of-fit and to tune model parameters. Lastly, we extend this framework for Binomial data to single parameter exponential family data more generally. We discuss multiple ways to fit these models, as well as extensions of the goodness-of-fit test. Collectively, these methods form a novel paradigm for non-Gaussian latent variables with many potential future applications.
URI: http://arks.princeton.edu/ark:/88435/dsp0144558h06x
Alternate format: The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog: catalog.princeton.edu
Type of Material: Academic dissertations (Ph.D.)
Language: en
Appears in Collections:Quantitative Computational Biology

Files in This Item:
File Description SizeFormat 
Hao_princeton_0181D_12794.pdf6.41 MBAdobe PDFView/Download


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.