MCMC algorithms for sampling from multimodal and changing distributions

Lee, Holden

Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01c534fr81b

Title:	MCMC algorithms for sampling from multimodal and changing distributions
Authors:	Lee, Holden
Advisors:	Arora, Sanjeev
Contributors:	Mathematics Department
Keywords:	algorithms machine learning Markov chain MCMC probability
Subjects:	Mathematics Applied mathematics Computer science
Issue Date:	2019
Publisher:	Princeton, NJ : Princeton University
Abstract:	The problem of sampling from a probability distribution is a fundamental problem in Bayesian statistics and machine learning, with applications throughout the sciences. One common algorithmic framework for solving this problem is Markov Chain Monte Carlo (MCMC). However, a large gap exists between simple settings where MCMC has been proven to work, and complex settings arising in practice. In this thesis, I make progress towards closing this gap, focusing on two hurdles in particular. In Chapter 2, I consider the problem of sampling from multimodal distributions. Many distributions arising in practice, from simple mixture models to deep generative models, are multimodal, so any Markov chain which makes local moves will get stuck in one mode. Although a variety of temperature heuristics are used to address this problem, their theoretical guarantees are not well-understood even for simple multimodal distributions. I analyze an algorithm combining Langevin diffusion with simulated tempering, a heuristic which speeds up mixing by transitioning between different temperatures of the distribution. I develop a general method to prove mixing time using ``soft decompositions'' of Markov processes, and use it to prove rapid mixing for (polynomial) mixtures of log-concave distributions. In Chapter 3, I address the problem of sampling from the distributions $p_t(x) \propto e^{-\sum_{k=0}^t f_k(x)}$ for each epoch $1\le t\le T$ in an online manner, given a sequence of (convex) functions $f_0,\ldots, f_T$. This problem arises in large-scale Bayesian inference (for instance, online logistic regression) where instead of obtaining all the observations at once, one constantly acquires new data, and must continuously update the distribution. All previous results for this problem imply a bound on the number of gradient evaluations at each epoch $t$ that grows at least linearly in $t$. For this problem, I show that a certain variance-reduced SGLD (stochastic gradient Langevin dynamics) algorithm solves the online sampling problem with fixed TV-error $\epsilon$ with an almost constant number of gradient evaluations per epoch.
URI:	http://arks.princeton.edu/ark:/88435/dsp01c534fr81b
Alternate format:	The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog: catalog.princeton.edu
Type of Material:	Academic dissertations (Ph.D.)
Language:	en
Appears in Collections:	Mathematics

Files in This Item:

File	Description	Size	Format
Lee_princeton_0181D_12948.pdf		1.25 MB	Adobe PDF	View/Download

Show full item record

Search

Browse