Statistical Methods for Complex Datasets

Xia, Lucy

Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01h989r552k

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Fan, Jianqing	en_US
dc.contributor.advisor	Rigollet, Philippe	en_US
dc.contributor.author	Xia, Lucy	en_US
dc.contributor.other	Operations Research and Financial Engineering Department	en_US
dc.date.accessioned	2015-06-23T19:39:46Z	-
dc.date.available	2015-06-23T19:39:46Z	-
dc.date.issued	2015	en_US
dc.identifier.uri	http://arks.princeton.edu/ark:/88435/dsp01h989r552k	-
dc.description.abstract	Due to the development of technology, modern datasets are evolving in terms of size and complexity. In particular, the availability of various datasets ranging from genomic sequencing data to internet data attracts our attention to the paradigm of "high-dimensional" statistics, where the number of measured parameters can be much larger than the sample size. This thesis addresses several challenges arising from "high-dimensional" settings, in three distinct fields of statistics. In particular, we propose new methods that could efficiently approach a broader class of distributions beyond Gaussian, and we develop new model averaging schemes when complex datasets cannot be well-explained by a single model. More specifically, in the first part of this thesis, we study binary-labelled classification under elliptical models. We propose using Rayleigh quotients as the criterion to select quadratic classification frontiers. This formulation not only includes Fisher's linear discriminant as a special case, but further considers the interactions between covariates. In the second part, we study the network estimation problem in graphical models. We estimate the existence of edges through evaluating conditional distance correlations. Similar to the previous part, our method can deal with a more general class of distributions beyond Gaussian and thus serves as a generalization of Gaussian graphical models. Lastly, we study a classical problem on aggregating a general collection of affine estimators for fixed design regression. We propose a generalized Q-Aggregation scheme of affine estimators, and prove sharp oracle inequalities that hold with high probability. We also apply our results to universal aggregation and show that our proposed estimator leads simultaneously to all the best known bounds for aggregation with high probability.	en_US
dc.language.iso	en	en_US
dc.publisher	Princeton, NJ : Princeton University	en_US
dc.relation.isformatof	The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the <a href=http://catalog.princeton.edu> library's main catalog </a>	en_US
dc.subject.classification	Statistics	en_US
dc.title	Statistical Methods for Complex Datasets	en_US
dc.type	Academic dissertations (Ph.D.)	en_US
pu.projectgrantnumber	690-2143	en_US
Appears in Collections:	Operations Research and Financial Engineering

Files in This Item:

File	Description	Size	Format
Xia_princeton_0181D_11431.pdf		1.31 MB	Adobe PDF	View/Download

Show simple item record

Search

Browse