Please use this identifier to cite or link to this item:
http://arks.princeton.edu/ark:/88435/dsp019z903235x
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | Schapire, Robert E | - |
dc.contributor.advisor | Engelhardt, Barbara E | - |
dc.contributor.author | Basbug, Mehmet Emin | - |
dc.contributor.other | Electrical Engineering Department | - |
dc.date.accessioned | 2017-04-12T20:36:23Z | - |
dc.date.available | 2017-04-12T20:36:23Z | - |
dc.date.issued | 2017 | - |
dc.identifier.uri | http://arks.princeton.edu/ark:/88435/dsp019z903235x | - |
dc.description.abstract | Latent variable models have two basic components: a latent structure encoding a hypothesized complex pattern and an observation model capturing the data distribution. With the advancements in machine learning and increasing availability of resources, we are able to perform inference in deeper and more sophisticated latent variable models. In most cases, these models are designed with a particular application in mind; hence, they tend to have restrictive observation models. The challenge, surfaced with the increasing diversity of data sets, is to generalize these latent models to work with different data types. We aim to address this problem by utilizing exponential dispersion models (EDMs) and proposing mechanisms for incorporating them into latent structures. In Chapter 2, we show that the density of steep EDMs can be expressed with a Bregman divergence. Based on this relationship, we parametrize families of steep EDMs for various data types. We then use these families in the mixture model setting and propose an expectation-maximization algorithm (AdaCluster) that can identify the underlying distribution of each attribute in a heterogeneous data set. In Chapter 3, we generalize hierarchical Poisson factorization, a Bayesian non-negative matrix factorization model, by compounding the original Poisson output with EDMs. We show that the proposed model is particularly effective for large data sets with extreme sparsity and arbitrary data distribution. In Chapter 4, we use the compound-Poisson-EDM structure within the context of missing data. We show that an arbitrary data-generating model with EDM output---such as Gaussian mixture model, probabilistic matrix factorization, Poisson mixture model or linear regression model---can be coupled with a Poisson factorization encoding the missing-data pattern through compounding. In particular, we argue that the heteroscedastic impact of missing-data pattern on the dispersion of observation variable can be captured with the proposed model. | - |
dc.language.iso | en | - |
dc.publisher | Princeton, NJ : Princeton University | - |
dc.relation.isformatof | The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog: <a href=http://catalog.princeton.edu> catalog.princeton.edu </a> | - |
dc.subject | Bregman Divergence | - |
dc.subject | Clustering | - |
dc.subject | Exponential Dispersion Model | - |
dc.subject | Machine Learning | - |
dc.subject | Matrix Factorization | - |
dc.subject | Missing Data | - |
dc.subject.classification | Artificial intelligence | - |
dc.subject.classification | Computer science | - |
dc.title | Integrating Exponential Dispersion Models to Latent Structures | - |
dc.type | Academic dissertations (Ph.D.) | - |
pu.projectgrantnumber | 690-2143 | - |
Appears in Collections: | Electrical Engineering |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Basbug_princeton_0181D_12037.pdf | 2.32 MB | Adobe PDF | View/Download |
Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.