Please use this identifier to cite or link to this item:
http://arks.princeton.edu/ark:/88435/dsp01t435gg78m
Title: | Statistical Optimization for High Dimensional Machine Learning with Privacy and Sparsity Constraints |
Authors: | Ge, Jason |
Advisors: | Wang, Mengdi |
Contributors: | Operations Research and Financial Engineering Department |
Keywords: | High dimensional statistics Machine Learning Optimization algorithms Privacy-preserving data analysis Sparse Learning |
Subjects: | Statistics Operations research |
Issue Date: | 2019 |
Publisher: | Princeton, NJ : Princeton University |
Abstract: | High-dimensionality and privacy sensitive data in modern machine learning problems poses computational challenges that ask for novel optimization algorithms. This thesis presents optimization algorithms that achieve superior theoretical and empirical performance by leveraging structure of the statistical model in machine learning problems. Chapter 2 of the thesis proposes a distributed privacy-preserving sparse PCA (DPS-PCA) algorithm that generates a minimax-optimal sparse PCA estimator under differential privacy constraints. Data providers can use this algorithm to collaboratively analyze the union of their datasets while limiting the disclosure of their private information. DPS-PCA can recover the leading eigen-space of the population covariance at a geometric convergence rate, and simultaneously achieves the optimal minimax statistical error for high-dimensional data. In Chapter 3, we present a pathwise Active Set Proximal Newton algorithm for estimating L1 regularized sparse generalized linear models in high dimensions. With an efficient active set strategy and a pathwise optimization scheme, our algorithm leverages the restricted strong convexity of the loss function (i.e., the negative log-likelihood) and achieves an asymptotic quadratic rate of convergence, even if the loss function itself is not strongly convex. Chapter 4 presents a DC proximal Newton algorithm for solving high-dimensional sparse learning problems under non-convex penalties such as MCP and SCAD. We prove in theory that the proposed algorithm can achieve local quadratic convergence within each stage of convex relaxation, and it takes only a few convex relaxations to obtain a sparse approximate local optimum. Finally in Chapter 5 we present an R and Python package that implements the sparse learning algorithms. Our package achieves state-of-the-art performance among all the available sparse learning solvers that handles L1 and nonconvex penalties. |
URI: | http://arks.princeton.edu/ark:/88435/dsp01t435gg78m |
Alternate format: | The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog: catalog.princeton.edu |
Type of Material: | Academic dissertations (Ph.D.) |
Language: | en |
Appears in Collections: | Operations Research and Financial Engineering |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Ge_princeton_0181D_12833.pdf | 2.2 MB | Adobe PDF | View/Download |
Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.