Please use this identifier to cite or link to this item:
http://arks.princeton.edu/ark:/88435/dsp01pz50gw21f
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | Fan, Jianqing | en_US |
dc.contributor.author | Barut, Ahmet Emre | en_US |
dc.contributor.other | Operations Research and Financial Engineering Department | en_US |
dc.date.accessioned | 2013-09-16T17:26:27Z | - |
dc.date.available | 2013-09-16T17:26:27Z | - |
dc.date.issued | 2013 | en_US |
dc.identifier.uri | http://arks.princeton.edu/ark:/88435/dsp01pz50gw21f | - |
dc.description.abstract | The aim of this thesis is to develop methods for variable selection and statistical prediction for high dimensional statistical problems. Along with proposing new and innovative procedures, this thesis also focuses on the theoretical properties of the proposed methods and establishes bounds on the statistical error of resulting estimators. The main body of the thesis is divided into three parts. In Chapter 1, a variable screening method for generalized linear models is discussed. The emphasis of the chapter is to provide a procedure to reduce the number of variables in a reliable and fast manner. Then, Chapter 2 considers the linear regression problem in high dimensions when the noise has heavy tails. To perform robust variable selection, a new method, called adaptive robust Lasso, is introduced. Finally, in Chapter 3, the subject is high dimensional classification problems. In this chapter, a robust approach for this problem is proposed and theoretical properties for this approach are established. Overall, the methods proposed in this thesis collectively attempt to solve many of the issues arising in high dimensional statistics, from screening to variable selection. In Chapter 1, we study the variable screening problem for generalized linear models. In many applications, researchers often have some prior knowledge that a certain set of variables is related to the response. In such a situation, a natural assessment on the relative importance of the other predictors is the conditional contributions of the individual predictors in presence of the known set of variables. This results in conditional sure independence screening (CSIS). We propose and study CSIS in the context of generalized linear models. For ultrahigh-dimensional statistical problems, we give conditions under which sure screening is possible and derive an upper bound on the number of selected variables. We also spell out the situation under which CSIS yields model selection consistency. In Chapter 2, we consider the heavy-tailed high dimensional linear regression problem. In the ultra-high dimensional setting, where the dimensionality can grow exponentially with the sample size, we investigate the model selection oracle property and establish the asymptotic normality of a quantile regression based method called WR-Lasso. We show that only mild conditions on the model error distribution are needed. Our theoretical results also reveal that adaptive choice of the weight vector is essential for the WR-Lasso to enjoy these nice asymptotic properties. To make the WR-Lasso practically feasible, we propose a two-step procedure, called adaptive robust Lasso (AR-Lasso), in which the weight vector in the second step is constructed based on the L_1 penalized quantile regression estimate from the first step. In Chapter 3, we provide an analysis about the issue of measurement errors in high dimensional linear classification problems. For such settings, we propose a new estimator called the robust sparse linear discriminant, that recovers the sparsity signal and adapts to the unknown noise level simultaneously. In contrast to the existing methods, we show that this new method has low risk properties even in the case of measurement errors. Moreover, we propose a new algorithm that recovers the solution paths for a continuum of regularization parameter values. | en_US |
dc.language.iso | en | en_US |
dc.publisher | Princeton, NJ : Princeton University | en_US |
dc.relation.isformatof | The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the <a href=http://catalog.princeton.edu> library's main catalog </a> | en_US |
dc.subject | Classification | en_US |
dc.subject | Fisher Discriminant | en_US |
dc.subject | Generalized Linear Models | en_US |
dc.subject | High Dimensional Models | en_US |
dc.subject | Penalized Estimators | en_US |
dc.subject | Statistics | en_US |
dc.subject.classification | Statistics | en_US |
dc.subject.classification | Mathematics | en_US |
dc.subject.classification | Biostatistics | en_US |
dc.title | Variable Selection and Prediction in High Dimensional Models | en_US |
dc.type | Academic dissertations (Ph.D.) | en_US |
pu.projectgrantnumber | 690-2143 | en_US |
Appears in Collections: | Operations Research and Financial Engineering |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Barut_princeton_0181D_10632.pdf | 558.08 kB | Adobe PDF | View/Download |
Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.