Please use this identifier to cite or link to this item:
http://arks.princeton.edu/ark:/99999/fk4st9290q
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | Engelhardt, Barbara | |
dc.contributor.author | Gundersen, Gregory | |
dc.contributor.other | Computer Science Department | |
dc.date.accessioned | 2021-10-04T13:25:07Z | - |
dc.date.available | 2021-10-04T13:25:07Z | - |
dc.date.created | 2021-01-01 | |
dc.date.issued | 2021 | |
dc.identifier.uri | http://arks.princeton.edu/ark:/99999/fk4st9290q | - |
dc.description.abstract | Latent variables allow researchers and engineers to encode assumptions into their statistical models. A latent variable might, for example, represent an unobserved covariate, measurement error, or a missing class label. Inference is challenging because one must account for the conditional dependence structure induced by these variables, and marginalization is often intractable. In this thesis, I present several practical algorithms for inferring latent structure in probabilistic models used in computational biology, neuroscience, and time-series analysis. First, I present a multi-view framework that combines neural networks and probabilistic canonical correlation analysis to estimate shared and view-specific latent structure of paired samples of histological images and gene expression levels. The model is trained end-to-end to estimate all parameters simultaneously, and we show that the latent variables capture interpretable structure, such as tissue-specific and morphological variation. Next, I present a family of nonlinear dimension-reduction models that use random features to support non-Gaussian data likelihoods. By approximating a nonlinear relationship between the latent variables and observations with a function that is linear with respect to random features, we induce closed-form gradients of the posterior distribution with respect to the latent variables. This allows for gradient-based nonlinear dimension-reduction models for a variety of data likelihoods. Finally, I discuss lowering the computational cost of online Bayesian filtering of time series with abrupt changes in structure, called changepoints. We consider settings in which a time series has multiple data sources, each with an associated cost. We trade the cost of a data source against the quality or "fidelity" of that source and how its fidelity affects the estimation of changepoints. Our framework makes cost-sensitive decisions about which data source to use based on minimizing the information entropy of the posterior distribution over changepoints. | |
dc.format.mimetype | application/pdf | |
dc.language.iso | en | |
dc.publisher | Princeton, NJ : Princeton University | |
dc.relation.isformatof | The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog: <a href=http://catalog.princeton.edu>catalog.princeton.edu</a> | |
dc.subject | bayesian inference | |
dc.subject | changepoint detection | |
dc.subject | gaussian processes | |
dc.subject | latent variable modeling | |
dc.subject | probabilistic modeling | |
dc.subject.classification | Artificial intelligence | |
dc.title | Practical Algorithms for Latent Variable Models | |
dc.type | Academic dissertations (Ph.D.) | |
pu.date.classyear | 2021 | |
pu.department | Computer Science | |
Appears in Collections: | Computer Science |
Files in This Item:
File | Size | Format | |
---|---|---|---|
Gundersen_princeton_0181D_13724.pdf | 16.77 MB | Adobe PDF | View/Download |
Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.