Please use this identifier to cite or link to this item:
http://arks.princeton.edu/ark:/99999/fk43b7hm9h
Title: | Selected Topics in Deep Learning Theory and Continuous-time Hidden Markov Models |
Authors: | Wang, Qingcan |
Advisors: | E, Weinan |
Contributors: | Applied and Computational Mathematics Department |
Subjects: | Applied mathematics |
Issue Date: | 2021 |
Publisher: | Princeton, NJ : Princeton University |
Abstract: | The first part of the thesis proves some theoretical results in deep learning. For the approximation problem, we prove that deep neural networks can approximate analytic functions exponentially fast. The number of parameters needed to achieve an error tolerance of epsilon is O((log 1/epsilon)^d), and it is exponential in the sense that the rate depends only on log 1/epsilon instead of epsilon itself. We also develop a general method to show that the deep networks never have worse approximation properties than shallow ones. For the optimization problem, we analyze the global convergence of gradient descent for deep linear residual networks by proposing a new initialization: zero-asymmetric (ZAS) initialization. It is motivated by avoiding stable manifolds of saddle points. We prove that under the ZAS initialization, for an arbitrary target matrix, gradient descent converges to an epsilon-optimal point in O(L^3 log 1/epsilon) iterations, which scales polynomially with the network depth L. It demonstrates the importance of the residual structure and the initialization in the optimization for deep linear neural networks. The second part focuses on continuous-time hidden Markov models (CT-HMM), where both the hidden states and observations occur in continuous time. We propose a unified framework that formally obtains the model parameter estimation by taking continuous-time limit of the classical discrete-time Baum-Welch algorithm, and recovers and extends several previous results in CT-HMM under different settings. Here two settings are illustrated: hidden jump process with a finite state space, and hidden diffusion process with a continuous state space. For each setting, we first estimate the hidden state given the observations and model parameters, showing that the posterior distribution of the hidden states can be described by differential equations in continuous time. Then we consider the estimation of unknown model parameters, deriving the continuous-time formulas for the expectation-maximization algorithm. We also propose a Monte Carlo method based on the continuous formulation, sampling the posterior distribution of the hidden states and updating the parameter estimation. |
URI: | http://arks.princeton.edu/ark:/99999/fk43b7hm9h |
Alternate format: | The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog: catalog.princeton.edu |
Type of Material: | Academic dissertations (Ph.D.) |
Language: | en |
Appears in Collections: | Applied and Computational Mathematics |
Files in This Item:
File | Size | Format | |
---|---|---|---|
Wang_princeton_0181D_13713.pdf | 1.2 MB | Adobe PDF | View/Download |
Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.