Please use this identifier to cite or link to this item:
http://arks.princeton.edu/ark:/99999/fk4qn7kq5d
Title: | Neural Network Learning: A Multiscale-Entropy and Self-Similarity Approach |
Authors: | Asadi, Amir Reza |
Advisors: | Abbe, Emmanuel |
Contributors: | Electrical Engineering Department |
Keywords: | Chaining Information Theory Machine Learning Multiscale Entropy Neural Networks Self-Similarity |
Subjects: | Information science Computer science Electrical engineering |
Issue Date: | 2021 |
Publisher: | Princeton, NJ : Princeton University |
Abstract: | Neural networks are machine learning models whose original design has been vaguely inspired by networks of neurons in human brains. Due to recent technological advances that have enabled fast computations on larger models and more training data, neural networks have found many applications in a growing number of areas of science such as computer vision, natural language processing, and medical imaging. Despite the practical successes of training these models with stochastic gradient descent (SGD) or its variants, finding a proper theoretical ground for the learning mechanism of these models yet remains an active area of research and a central challenge in machine learning. In this dissertation, to construct a theoretical underpinning for these machine learning models, we focus on the main characteristic of neural networks that distinguishes them from other learning models, i.e., their multilevel and hierarchical architecture. Based on ideas and tools from information theory, high-dimensional probability, and statistical physics, we present a new perspective on designing the architecture of neural networks and their training procedure along with theoretical guarantees. The training procedure is multiscale in nature, takes into account the hierarchical architecture of these models, and is characteristically different from SGD and its extensions which treat the whole network as a single block and from classical layer-wise training procedures. By extending the technique of chaining of high-dimensional probability into an algorithm-dependent setting, the notion of multiscale-entropic regularization of neural networks is introduced. We show that the minimizing distribution of such regularization can be characterized precisely with a procedure analogous to the renormalization group of statistical physics. Then, motivated by the fact that the basis of renormalization group theory is the notion of self-similarity, an inherent type of self-similarity in neural networks with near-linear activation functions is identified. This kind of self-similarity is then exploited to efficiently simulate an approximation to the minimizing distribution of multiscale-entropic regularization as the training procedure. Our results can also be viewed as a multiscale extension of the celebrated Gibbs-Boltzmann distribution and the maximum entropy results of Jaynes (1957), and a Bayesian variant of the renormalization group procedure. |
URI: | http://arks.princeton.edu/ark:/99999/fk4qn7kq5d |
Alternate format: | The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog: catalog.princeton.edu |
Type of Material: | Academic dissertations (Ph.D.) |
Language: | en |
Appears in Collections: | Electrical Engineering |
Files in This Item:
File | Size | Format | |
---|---|---|---|
Asadi_princeton_0181D_13804.pdf | 3.26 MB | Adobe PDF | View/Download |
Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.