Please use this identifier to cite or link to this item:
http://arks.princeton.edu/ark:/88435/dsp01c534fr81b
Title: | MCMC algorithms for sampling from multimodal and changing distributions |
Authors: | Lee, Holden |
Advisors: | Arora, Sanjeev |
Contributors: | Mathematics Department |
Keywords: | algorithms machine learning Markov chain MCMC probability |
Subjects: | Mathematics Applied mathematics Computer science |
Issue Date: | 2019 |
Publisher: | Princeton, NJ : Princeton University |
Abstract: | The problem of sampling from a probability distribution is a fundamental problem in Bayesian statistics and machine learning, with applications throughout the sciences. One common algorithmic framework for solving this problem is Markov Chain Monte Carlo (MCMC). However, a large gap exists between simple settings where MCMC has been proven to work, and complex settings arising in practice. In this thesis, I make progress towards closing this gap, focusing on two hurdles in particular. In Chapter 2, I consider the problem of sampling from *multimodal* distributions. Many distributions arising in practice, from simple mixture models to deep generative models, are multimodal, so any Markov chain which makes local moves will get stuck in one mode. Although a variety of temperature heuristics are used to address this problem, their theoretical guarantees are not well-understood even for simple multimodal distributions. I analyze an algorithm combining Langevin diffusion with *simulated tempering*, a heuristic which speeds up mixing by transitioning between different temperatures of the distribution. I develop a general method to prove mixing time using ``soft decompositions'' of Markov processes, and use it to prove rapid mixing for (polynomial) mixtures of log-concave distributions. In Chapter 3, I address the problem of sampling from the distributions $p_t(x) \propto e^{-\sum_{k=0}^t f_k(x)}$ for each epoch $1\le t\le T$ in an *online* manner, given a sequence of (convex) functions $f_0,\ldots, f_T$. This problem arises in large-scale Bayesian inference (for instance, online logistic regression) where instead of obtaining all the observations at once, one constantly acquires new data, and must continuously update the distribution. All previous results for this problem imply a bound on the number of gradient evaluations at each epoch $t$ that grows at least linearly in $t$. For this problem, I show that a certain variance-reduced SGLD (stochastic gradient Langevin dynamics) algorithm solves the online sampling problem with fixed TV-error $\epsilon$ with an *almost constant* number of gradient evaluations per epoch. |
URI: | http://arks.princeton.edu/ark:/88435/dsp01c534fr81b |
Alternate format: | The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog: catalog.princeton.edu |
Type of Material: | Academic dissertations (Ph.D.) |
Language: | en |
Appears in Collections: | Mathematics |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Lee_princeton_0181D_12948.pdf | 1.25 MB | Adobe PDF | View/Download |
Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.