Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01h989r552k
Full metadata record
DC FieldValueLanguage
dc.contributor.advisorFan, Jianqingen_US
dc.contributor.advisorRigollet, Philippeen_US
dc.contributor.authorXia, Lucyen_US
dc.contributor.otherOperations Research and Financial Engineering Departmenten_US
dc.date.accessioned2015-06-23T19:39:46Z-
dc.date.available2015-06-23T19:39:46Z-
dc.date.issued2015en_US
dc.identifier.urihttp://arks.princeton.edu/ark:/88435/dsp01h989r552k-
dc.description.abstractDue to the development of technology, modern datasets are evolving in terms of size and complexity. In particular, the availability of various datasets ranging from genomic sequencing data to internet data attracts our attention to the paradigm of "high-dimensional" statistics, where the number of measured parameters can be much larger than the sample size. This thesis addresses several challenges arising from "high-dimensional" settings, in three distinct fields of statistics. In particular, we propose new methods that could efficiently approach a broader class of distributions beyond Gaussian, and we develop new model averaging schemes when complex datasets cannot be well-explained by a single model. More specifically, in the first part of this thesis, we study binary-labelled classification under elliptical models. We propose using Rayleigh quotients as the criterion to select quadratic classification frontiers. This formulation not only includes Fisher's linear discriminant as a special case, but further considers the interactions between covariates. In the second part, we study the network estimation problem in graphical models. We estimate the existence of edges through evaluating conditional distance correlations. Similar to the previous part, our method can deal with a more general class of distributions beyond Gaussian and thus serves as a generalization of Gaussian graphical models. Lastly, we study a classical problem on aggregating a general collection of affine estimators for fixed design regression. We propose a generalized Q-Aggregation scheme of affine estimators, and prove sharp oracle inequalities that hold with high probability. We also apply our results to universal aggregation and show that our proposed estimator leads simultaneously to all the best known bounds for aggregation with high probability.en_US
dc.language.isoenen_US
dc.publisherPrinceton, NJ : Princeton Universityen_US
dc.relation.isformatofThe Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the <a href=http://catalog.princeton.edu> library's main catalog </a>en_US
dc.subject.classificationStatisticsen_US
dc.titleStatistical Methods for Complex Datasetsen_US
dc.typeAcademic dissertations (Ph.D.)en_US
pu.projectgrantnumber690-2143en_US
Appears in Collections:Operations Research and Financial Engineering

Files in This Item:
File Description SizeFormat 
Xia_princeton_0181D_11431.pdf1.31 MBAdobe PDFView/Download


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.