Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01h989r552k
Title: Statistical Methods for Complex Datasets
Authors: Xia, Lucy
Advisors: Fan, Jianqing
Rigollet, Philippe
Contributors: Operations Research and Financial Engineering Department
Subjects: Statistics
Issue Date: 2015
Publisher: Princeton, NJ : Princeton University
Abstract: Due to the development of technology, modern datasets are evolving in terms of size and complexity. In particular, the availability of various datasets ranging from genomic sequencing data to internet data attracts our attention to the paradigm of "high-dimensional" statistics, where the number of measured parameters can be much larger than the sample size. This thesis addresses several challenges arising from "high-dimensional" settings, in three distinct fields of statistics. In particular, we propose new methods that could efficiently approach a broader class of distributions beyond Gaussian, and we develop new model averaging schemes when complex datasets cannot be well-explained by a single model. More specifically, in the first part of this thesis, we study binary-labelled classification under elliptical models. We propose using Rayleigh quotients as the criterion to select quadratic classification frontiers. This formulation not only includes Fisher's linear discriminant as a special case, but further considers the interactions between covariates. In the second part, we study the network estimation problem in graphical models. We estimate the existence of edges through evaluating conditional distance correlations. Similar to the previous part, our method can deal with a more general class of distributions beyond Gaussian and thus serves as a generalization of Gaussian graphical models. Lastly, we study a classical problem on aggregating a general collection of affine estimators for fixed design regression. We propose a generalized Q-Aggregation scheme of affine estimators, and prove sharp oracle inequalities that hold with high probability. We also apply our results to universal aggregation and show that our proposed estimator leads simultaneously to all the best known bounds for aggregation with high probability.
URI: http://arks.princeton.edu/ark:/88435/dsp01h989r552k
Alternate format: The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog
Type of Material: Academic dissertations (Ph.D.)
Language: en
Appears in Collections:Operations Research and Financial Engineering

Files in This Item:
File Description SizeFormat 
Xia_princeton_0181D_11431.pdf1.31 MBAdobe PDFView/Download


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.