Please use this identifier to cite or link to this item:
http://arks.princeton.edu/ark:/88435/dsp018623j117g
Title: | A Supervised Topic Model Based Approach for Change Point Detection in Review Data |
Authors: | Wang, Jean |
Advisors: | Vanderbei, Robert |
Department: | Operations Research and Financial Engineering |
Class Year: | 2016 |
Abstract: | As review data becomes more widespread and important in consumer decisions, it becomes increasingly important and useful to identify major changes in products or businesses. Traditionally, change point identification in review data has focused on the star ratings, which we show can be unreliable given the wide fluctuations that occur in real world data. In this thesis, we propose and implement a new method of change point detection among review data, by incorporating a previously overlooked element, that of the review text. By using topic modeling to discover the topics that are discussed in reviews, we can detect shifts in the establishment when the words describing the establishment shift. Furthermore, by using supervised Latent Dirichlet Allocation (sLDA), we can incorporate the signals from the star ratings into our modelled topics, for improved change point identification. We evaluated our new approach on a subset of human annotated data, which showed that change point detection with sLDA had higher precision and recall than the other change point methods. Finally, we provide a close analysis of these methods on one establishment, illustrating how our novel approach can be used to identify the causes of a change in an establishment. |
Extent: | 64 pages |
URI: | http://arks.princeton.edu/ark:/88435/dsp018623j117g |
Type of Material: | Princeton University Senior Theses |
Language: | en_US |
Appears in Collections: | Operations Research and Financial Engineering, 2000-2019 |
Files in This Item:
File | Size | Format | |
---|---|---|---|
Wang_Jean_final_thesis.pdf | 979.87 kB | Adobe PDF | Request a copy |
Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.