Nonnegative Matrix Factorization: An Empirical Analysis

Collins, Liam

Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01gt54kq888

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Chen, Yuxin	-
dc.contributor.advisor	Brinton, Christopher	-
dc.contributor.author	Collins, Liam	-
dc.date.accessioned	2019-08-16T17:56:47Z	-
dc.date.available	2019-08-16T17:56:47Z	-
dc.date.created	2019-04-22	-
dc.date.issued	2019-08-16	-
dc.identifier.uri	http://arks.princeton.edu/ark:/88435/dsp01gt54kq888	-
dc.description.abstract	Dimensionality-reduction and information-extraction techniques are becoming increasingly valuable as more data is collected. Yet popular techniques such as principal component analysis (PCA) and the singular value decomposition (SVD) generate factors with negative entries, severely diminishing their interpretability for nonnegative data. Nonnegative matrix factorization (NMF) is a dimensionality-reduction and information-extraction technique that aims to obtain two low-dimensional, nonnegative factors of a nonnegative dataset, where one factor is a matrix of features and the other is a matrix of weights. These features and weights reveal useful properties of the data, as their nonnegativity implies an additive features model which has insightful physical interpretations for many applications. However, with greater potential insight comes greater computational difficulty: the nonnegativity constraints on the features and weights make computing a globally optimal version of them a nonconvex and NP-hard problem. There exist many iterative heuristics to solve NMF that tend to converge to an effective solution in practice, but lack general performance guarantees - their behavior is still not yet thoroughly understood, especially in relation to the type of data being factored. Conversely, numerous algorithms have recently been developed with provable error bounds under certain assumptions on the data, but their practicality is questionable. Meanwhile, many initialization methods have been developed to augment the performance of NMF algorithms, yet comparative experimentation and analysis quantifying their efficacy remains insufficient in the literature. In this paper, we comprehensively evaluate the performance of the most popular NMF algorithms and initialization techniques over varying characteristics of synthetic and real datasets in order to paint a clearer picture of when certain methods perform better and worse than others. Our analysis includes extensive background of the theory and intuition behind each technique and assessment of how this theory and intuition plays out in practice. We also evaluate how well NMF algorithms solve a particular practical problem, namely extracting latent variables from an educational dataset. Our results suggest that there is not one algorithm nor initialization technique that always performs the best, so for optimal performance, the NMF algorithm and initialization must be chosen based on the specific problem setting, and our work provides guidance for doing so.	en_US
dc.format.mimetype	application/pdf	-
dc.language.iso	en	en_US
dc.title	Nonnegative Matrix Factorization: An Empirical Analysis	en_US
dc.type	Princeton University Senior Theses	-
pu.date.classyear	2019	en_US
pu.department	Electrical Engineering	en_US
pu.pdf.coverpage	SeniorThesisCoverPage	-
pu.contributor.authorid	961131288	-
pu.certificate	Applications of Computing Program	en_US
Appears in Collections:	Electrical Engineering, 1932-2020

Files in This Item:

File	Description	Size	Format
COLLINS-LIAM-THESIS.pdf		4.23 MB	Adobe PDF	Request a copy

Show simple item record

Search

Browse