Variable Selection and Prediction in High Dimensional Models

Barut, Ahmet Emre

Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01pz50gw21f

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Fan, Jianqing	en_US
dc.contributor.author	Barut, Ahmet Emre	en_US
dc.contributor.other	Operations Research and Financial Engineering Department	en_US
dc.date.accessioned	2013-09-16T17:26:27Z	-
dc.date.available	2013-09-16T17:26:27Z	-
dc.date.issued	2013	en_US
dc.identifier.uri	http://arks.princeton.edu/ark:/88435/dsp01pz50gw21f	-
dc.description.abstract	The aim of this thesis is to develop methods for variable selection and statistical prediction for high dimensional statistical problems. Along with proposing new and innovative procedures, this thesis also focuses on the theoretical properties of the proposed methods and establishes bounds on the statistical error of resulting estimators. The main body of the thesis is divided into three parts. In Chapter 1, a variable screening method for generalized linear models is discussed. The emphasis of the chapter is to provide a procedure to reduce the number of variables in a reliable and fast manner. Then, Chapter 2 considers the linear regression problem in high dimensions when the noise has heavy tails. To perform robust variable selection, a new method, called adaptive robust Lasso, is introduced. Finally, in Chapter 3, the subject is high dimensional classification problems. In this chapter, a robust approach for this problem is proposed and theoretical properties for this approach are established. Overall, the methods proposed in this thesis collectively attempt to solve many of the issues arising in high dimensional statistics, from screening to variable selection. In Chapter 1, we study the variable screening problem for generalized linear models. In many applications, researchers often have some prior knowledge that a certain set of variables is related to the response. In such a situation, a natural assessment on the relative importance of the other predictors is the conditional contributions of the individual predictors in presence of the known set of variables. This results in conditional sure independence screening (CSIS). We propose and study CSIS in the context of generalized linear models. For ultrahigh-dimensional statistical problems, we give conditions under which sure screening is possible and derive an upper bound on the number of selected variables. We also spell out the situation under which CSIS yields model selection consistency. In Chapter 2, we consider the heavy-tailed high dimensional linear regression problem. In the ultra-high dimensional setting, where the dimensionality can grow exponentially with the sample size, we investigate the model selection oracle property and establish the asymptotic normality of a quantile regression based method called WR-Lasso. We show that only mild conditions on the model error distribution are needed. Our theoretical results also reveal that adaptive choice of the weight vector is essential for the WR-Lasso to enjoy these nice asymptotic properties. To make the WR-Lasso practically feasible, we propose a two-step procedure, called adaptive robust Lasso (AR-Lasso), in which the weight vector in the second step is constructed based on the L_1 penalized quantile regression estimate from the first step. In Chapter 3, we provide an analysis about the issue of measurement errors in high dimensional linear classification problems. For such settings, we propose a new estimator called the robust sparse linear discriminant, that recovers the sparsity signal and adapts to the unknown noise level simultaneously. In contrast to the existing methods, we show that this new method has low risk properties even in the case of measurement errors. Moreover, we propose a new algorithm that recovers the solution paths for a continuum of regularization parameter values.	en_US
dc.language.iso	en	en_US
dc.publisher	Princeton, NJ : Princeton University	en_US
dc.relation.isformatof	The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the <a href=http://catalog.princeton.edu> library's main catalog </a>	en_US
dc.subject	Classification	en_US
dc.subject	Fisher Discriminant	en_US
dc.subject	Generalized Linear Models	en_US
dc.subject	High Dimensional Models	en_US
dc.subject	Penalized Estimators	en_US
dc.subject	Statistics	en_US
dc.subject.classification	Statistics	en_US
dc.subject.classification	Mathematics	en_US
dc.subject.classification	Biostatistics	en_US
dc.title	Variable Selection and Prediction in High Dimensional Models	en_US
dc.type	Academic dissertations (Ph.D.)	en_US
pu.projectgrantnumber	690-2143	en_US
Appears in Collections:	Operations Research and Financial Engineering

Files in This Item:

File	Description	Size	Format
Barut_princeton_0181D_10632.pdf		558.08 kB	Adobe PDF	View/Download

Show simple item record

Search

Browse