Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01fq977x51j
Full metadata record
DC FieldValueLanguage
dc.contributor.advisorFish, Robert-
dc.contributor.authorHuang, Betty-
dc.date.accessioned2018-08-14T18:10:27Z-
dc.date.available2018-08-14T18:10:27Z-
dc.date.created2018-05-07-
dc.date.issued2018-08-14-
dc.identifier.urihttp://arks.princeton.edu/ark:/88435/dsp01fq977x51j-
dc.description.abstractMany high-end cosmetics have drugstore duplicates—“dupes” for short—that achieve the sameeffect as the original at a much lower price point. There are many blog posts, articles, and webforums that recommend dupes for various products. However, it is tedious to search through theweb to find all this information and cross-reference it with product reviews to come to a purchasingdecision. We present a novel dupe-calculation method by using Linear-Chain Conditional RandomFields (CRF) to perform Product Named Entity Recognition (PNER) of scraped Google searchresults to extract dupe product names. We build a web and mobile front-end to display the data. Theresults and performance proved better than existing competitors, and show this method has muchpotential in exploiting this niche market.en_US
dc.format.mimetypeapplication/pdf-
dc.language.isoenen_US
dc.titleIt’s a Dupe: High-End Cosmetics for Less with Product NamedEntity Recognition of Scraped Web Pagesen_US
dc.typePrinceton University Senior Theses-
pu.date.classyear2018en_US
pu.departmentComputer Scienceen_US
pu.pdf.coverpageSeniorThesisCoverPage-
pu.contributor.authorid960919866-
Appears in Collections:Computer Science, 1988-2020

Files in This Item:
File Description SizeFormat 
HUANG-BETTY-THESIS.pdf1.83 MBAdobe PDF    Request a copy


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.