Feature Screening for the Lasso

Wang, Yun

Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01hq37vq979

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Ramadge, Peter J.	en_US
dc.contributor.author	Wang, Yun	en_US
dc.contributor.other	Electrical Engineering Department	en_US
dc.date.accessioned	2015-12-07T19:56:17Z	-
dc.date.available	2015-12-07T19:56:17Z	-
dc.date.issued	2015	en_US
dc.identifier.uri	http://arks.princeton.edu/ark:/88435/dsp01hq37vq979	-
dc.description.abstract	Recently, the sparse representation of data with respect to a dictionary of features has contributed to successful new methods in machine learning, pattern analysis, statistics and signal processing. At the heart of many sparse representation methods is the least squares problem with l1 regularization, often called the lasso problem. Despite being studied extensively, the applicability of lasso to large-scale problems has been hindered by the expensive computational cost. This dissertation investigates feature screening for the lasso problem, targeted at the aforementioned computational issue. For a given lasso problem, screening quickly identifies a subset of features that will receive zero weight in a solution. These features can be removed from the dictionary, prior to solving the problem, without impacting the optimality of the solution obtained. This has two potential advantages: it reduces the size of the dictionary, allowing the lasso problem to be solved with less resource, and it speeds up obtaining a solution. Current classes of one-shot screening tests are based on bounding the dual lasso solution within a sphere or the intersection of a sphere and a half space. We propose an optimal screening test when the dual solution is bounded within the intersection of a sphere and two half spaces, and empirically investigate the trade-o that this test makes between screening power and computational efficiency. We then go beyond the regime of one-shot screening and examine a sequential screening scheme for one target lasso problem. Using analytical and empirical means we give insight on how the values of this sequence should be chosen and show that well designed sequential screening yields significant improvement in dictionary reduction and computational efficiency for lightly regularized lasso problems. We propose and explore a data adaptive sequential screening (DASS) scheme, which achieves state-of-the-art performance. As one application of DASS, we show how it can also facilitate faster completion of sparse representation decision tasks, such as classification, without affecting statistical accuracy. In particular, for clip-level music genre classification, our proposed method yields improved clip classification accuracy and considerable computational speedup.	en_US
dc.language.iso	en	en_US
dc.publisher	Princeton, NJ : Princeton University	en_US
dc.relation.isformatof	The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog: http://catalog.princeton.edu/	en_US
dc.subject	classification	en_US
dc.subject	feature screening	en_US
dc.subject	feature selection	en_US
dc.subject	lasso	en_US
dc.subject	machine learning	en_US
dc.subject	sparse representation/regression	en_US
dc.subject.classification	Electrical engineering	en_US
dc.subject.classification	Statistics	en_US
dc.subject.classification	Computer science	en_US
dc.title	Feature Screening for the Lasso	en_US
dc.type	Academic dissertations (Ph.D.)	en_US
pu.projectgrantnumber	690-2143	en_US
Appears in Collections:	Electrical Engineering

Files in This Item:

File	Description	Size	Format
Wang_princeton_0181D_11528.pdf		3.16 MB	Adobe PDF	View/Download

Show simple item record

Search

Browse