Please use this identifier to cite or link to this item:
http://arks.princeton.edu/ark:/88435/dsp019c67wq32b
Title: | Robust High-Dimensional Regression and Factor Models |
Authors: | Wang, Yuyan |
Advisors: | Fan, Jianqing |
Contributors: | Operations Research and Financial Engineering Department |
Keywords: | Factor models High-dimensional linear regression Mixture modeling of hurricane Robust methods |
Subjects: | Statistics |
Issue Date: | 2016 |
Publisher: | Princeton, NJ : Princeton University |
Abstract: | High-throughput technologies generate datasets with huge dimensionality, large sample size and heterogeneous noises. Many traditional methods become computationally infeasible or no longer applicable with these datasets. My primary research is driven by the need for powerful statistical tools to address the challenges brought by big data from various fields in high-dimensional settings, with applications in finance and natural sciences. One big challenge brought by big data is the ultra-high dimensional covariates which render traditional regression techniques inadequate. Data subject to heavy-tailed errors are commonly encountered in various scientific fields. In Chapter 1, we propose RA-Lasso that robustly estimates the mean regression function without symmetry or light-tail assumptions on the data, which are required by all existing methods. We show that RA-lasso produces a consistent estimator at the same rate as the optimal rate even under light-tail situations. Another challenge in big data is heterogeneity. A typical regression assumes data is homogeneous in that the regression coefficients are the same for all observations, which is often inadequate in reality especially under high dimensional scenarios. In Chapter 3 we discussed the performance of existing prediction models including sparse linear and nonlinear models and mixture models in hurricane prediction. We also develop a mixture model with sparse and additive components that captures heterogeneity, sparsity and nonlinearity simultaneously, which is more flexible at the expense of a moderate increase in model complexity. Big data with high dimensionality can also be blessing when there are underlying common factors that drive the whole dataset. In Chapter 2, we discuss a thought-provoking scenario where we are interested in a small subset of assets but we have the whole market data available. We show that it is greatly more beneficial in terms of Fisher Information and convergence rate to use the whole market data than only the set of interest for covariance estimation in factor models, which is a fact ignored by almost all practitioners. |
URI: | http://arks.princeton.edu/ark:/88435/dsp019c67wq32b |
Alternate format: | The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog: catalog.princeton.edu |
Type of Material: | Academic dissertations (Ph.D.) |
Language: | en |
Appears in Collections: | Operations Research and Financial Engineering |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Wang_princeton_0181D_11813.pdf | 1.45 MB | Adobe PDF | View/Download |
Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.