Please use this identifier to cite or link to this item:
http://arks.princeton.edu/ark:/88435/dsp016395w727c
Title: | The veryyyy best paper you will EVER read!!!! ;) #sarcastic Sarcasm Detection on Twitter |
Authors: | Anderson, Samantha |
Advisors: | Blei, David |
Department: | Computer Science |
Class Year: | 2014 |
Abstract: | In recent years with the increasing popularity of texting and social media sites, people express themselves and communicate using text. Without vocal cues it can be difficult to decipher the true meaning and intention of the author. Sarcasm, hyperbole, and other rich aspects of spoken language can be lost in text. My project is motivated by the lack of understanding for how to detect sarcasm and the misinterpretations and confusion that may occur as a result. Because Twitter data is readily available and abundant, it was the ideal site to use for experimentation. I used sentiment analysis as a starting point for my research because sarcasm detection is a relatively new area of study. Sentiment analysis is a closely related field of study to sarcasm detection and there has been extensive research devoted to it. It is easy to find existing datasets for sentiment analysis which made it simple to start experimenting with techniques and get hands on experience with a text classification task. By replicating previous research and combining ideas from many various papers, I was able to achieve comparable accuracies to previous research for sentiment analysis. After collecting my own Twitter sarcasm corpus which was labeled automatically using the hashtags in the tweet, I began experimenting with methods for sarcasm detection. The methods I developed for sentiment analysis were easily adapted for use in sarcasm detection and created a high baseline to try to beat. I created two novel types of features to try to improve the classification accuracy. The first type of feature was a shift feature which attempts to capture dramatic shifts in sentiment or opinion within the tweet. The second type of feature was designed to incorporate context from other tweets. These context features look at a specific topic and contrast the prevailing opinion on Twitter with the opinion of the writer of a new tweet. Overall there were promising results which suggest that computers can more accurately detect sarcasm than people. By attempting to classify tweets as sarcastic, I aimed to discover the most important features of sarcasm in text. These discovered features can help further the understanding of how to automatically detect sarcasm as well as help users understand how to better convey their sarcasm. Sarcasm can entirely reverse the sentiment of a sentence so the results of this experiment may also improve 2 existing sentiment analysis or text summarization systems. |
Extent: | 36 pages |
URI: | http://arks.princeton.edu/ark:/88435/dsp016395w727c |
Type of Material: | Princeton University Senior Theses |
Language: | en_US |
Appears in Collections: | Computer Science, 1988-2020 |
Files in This Item:
File | Size | Format | |
---|---|---|---|
Anderson_Samantha_thesis.pdf | 618.4 kB | Adobe PDF | Request a copy |
Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.