Practical Algorithms for Latent Variable Models

Gundersen, Gregory

Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/99999/fk4st9290q

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Engelhardt, Barbara
dc.contributor.author	Gundersen, Gregory
dc.contributor.other	Computer Science Department
dc.date.accessioned	2021-10-04T13:25:07Z	-
dc.date.available	2021-10-04T13:25:07Z	-
dc.date.created	2021-01-01
dc.date.issued	2021
dc.identifier.uri	http://arks.princeton.edu/ark:/99999/fk4st9290q	-
dc.description.abstract	Latent variables allow researchers and engineers to encode assumptions into their statistical models. A latent variable might, for example, represent an unobserved covariate, measurement error, or a missing class label. Inference is challenging because one must account for the conditional dependence structure induced by these variables, and marginalization is often intractable. In this thesis, I present several practical algorithms for inferring latent structure in probabilistic models used in computational biology, neuroscience, and time-series analysis. First, I present a multi-view framework that combines neural networks and probabilistic canonical correlation analysis to estimate shared and view-specific latent structure of paired samples of histological images and gene expression levels. The model is trained end-to-end to estimate all parameters simultaneously, and we show that the latent variables capture interpretable structure, such as tissue-specific and morphological variation. Next, I present a family of nonlinear dimension-reduction models that use random features to support non-Gaussian data likelihoods. By approximating a nonlinear relationship between the latent variables and observations with a function that is linear with respect to random features, we induce closed-form gradients of the posterior distribution with respect to the latent variables. This allows for gradient-based nonlinear dimension-reduction models for a variety of data likelihoods. Finally, I discuss lowering the computational cost of online Bayesian filtering of time series with abrupt changes in structure, called changepoints. We consider settings in which a time series has multiple data sources, each with an associated cost. We trade the cost of a data source against the quality or "fidelity" of that source and how its fidelity affects the estimation of changepoints. Our framework makes cost-sensitive decisions about which data source to use based on minimizing the information entropy of the posterior distribution over changepoints.
dc.format.mimetype	application/pdf
dc.language.iso	en
dc.publisher	Princeton, NJ : Princeton University
dc.relation.isformatof	The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog: <a href=http://catalog.princeton.edu>catalog.princeton.edu</a>
dc.subject	bayesian inference
dc.subject	changepoint detection
dc.subject	gaussian processes
dc.subject	latent variable modeling
dc.subject	probabilistic modeling
dc.subject.classification	Artificial intelligence
dc.title	Practical Algorithms for Latent Variable Models
dc.type	Academic dissertations (Ph.D.)
pu.date.classyear	2021
pu.department	Computer Science
Appears in Collections:	Computer Science

Files in This Item:

File	Size	Format
Gundersen_princeton_0181D_13724.pdf	16.77 MB	Adobe PDF	View/Download

Show simple item record

Search

Browse