Robust High-Dimensional Regression and Factor Models

Wang, Yuyan

Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp019c67wq32b

Title:	Robust High-Dimensional Regression and Factor Models
Authors:	Wang, Yuyan
Advisors:	Fan, Jianqing
Contributors:	Operations Research and Financial Engineering Department
Keywords:	Factor models High-dimensional linear regression Mixture modeling of hurricane Robust methods
Subjects:	Statistics
Issue Date:	2016
Publisher:	Princeton, NJ : Princeton University
Abstract:	High-throughput technologies generate datasets with huge dimensionality, large sample size and heterogeneous noises. Many traditional methods become computationally infeasible or no longer applicable with these datasets. My primary research is driven by the need for powerful statistical tools to address the challenges brought by big data from various fields in high-dimensional settings, with applications in finance and natural sciences. One big challenge brought by big data is the ultra-high dimensional covariates which render traditional regression techniques inadequate. Data subject to heavy-tailed errors are commonly encountered in various scientific fields. In Chapter 1, we propose RA-Lasso that robustly estimates the mean regression function without symmetry or light-tail assumptions on the data, which are required by all existing methods. We show that RA-lasso produces a consistent estimator at the same rate as the optimal rate even under light-tail situations. Another challenge in big data is heterogeneity. A typical regression assumes data is homogeneous in that the regression coefficients are the same for all observations, which is often inadequate in reality especially under high dimensional scenarios. In Chapter 3 we discussed the performance of existing prediction models including sparse linear and nonlinear models and mixture models in hurricane prediction. We also develop a mixture model with sparse and additive components that captures heterogeneity, sparsity and nonlinearity simultaneously, which is more flexible at the expense of a moderate increase in model complexity. Big data with high dimensionality can also be blessing when there are underlying common factors that drive the whole dataset. In Chapter 2, we discuss a thought-provoking scenario where we are interested in a small subset of assets but we have the whole market data available. We show that it is greatly more beneficial in terms of Fisher Information and convergence rate to use the whole market data than only the set of interest for covariance estimation in factor models, which is a fact ignored by almost all practitioners.
URI:	http://arks.princeton.edu/ark:/88435/dsp019c67wq32b
Alternate format:	The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog: catalog.princeton.edu
Type of Material:	Academic dissertations (Ph.D.)
Language:	en
Appears in Collections:	Operations Research and Financial Engineering

Files in This Item:

File	Description	Size	Format
Wang_princeton_0181D_11813.pdf		1.45 MB	Adobe PDF	View/Download

Show full item record

Search

Browse