Please use this identifier to cite or link to this item:
http://arks.princeton.edu/ark:/88435/dsp011v53k094h
Title: | TEXT Elizondo_Patricio.pdf TEXT TEXT Applications of Natural Language Processing on Twitter Data to Predict Stock Price Movement |
Authors: | Hamilton, Justin |
Advisors: | Cattaneo, Matias |
Department: | Operations Research and Financial Engineering |
Certificate Program: | Applications of Computing Program |
Class Year: | 2020 |
Abstract: | Most consumers are involved in some form of social media, and with the rise of data analysis and natural language processing, the sentiment, actions, and news of the public is more quantifiable than ever. Much of the stock market is driven by news, with innovations, reports, and sometimes blunders being announced every day, as well as consumers, whose opinions and purchases ultimately fuel a large sector of the stock market. With social media being so common to the public, in which many express their ideas and opinions freely, we explore the predictive relationship of social media sentiment to the movement of several publicly traded companies on the New York Stock Exchange. Being able to use public sentiment as a predictor for a company’s stock value has many applications for investors trying to make even more informed decisions about which assets to trade. We use machine learning models on Twitter posts related to a company to predict whether the closing price is significantly above or below the opening price of a given day, or if it remains within a threshold of the original value. We employ pre-trained sentiment analysis as well as a bag of words data structure in order to use tweets as proxy of the sentiment of the public towards several publicly traded companies we gather tweets for. We train on tweets from 2019 relating to large companies in their respective industries, and use historical financial data for training and testing. We find that the K-Nearest Neighbors Algorithm has the highest out-of-sample accuracy of 75% using data represented by a bag of words, with the Random Forest algorithm performing nearly as well with an out-of-sample accuracy of 74% . Most models tested show promise with access to more data as well. |
URI: | http://arks.princeton.edu/ark:/88435/dsp011v53k094h |
Type of Material: | Princeton University Senior Theses |
Language: | en |
Appears in Collections: | Operations Research and Financial Engineering, 2000-2019 |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
HAMILTON-JUSTIN-THESIS.pdf | 750.94 kB | Adobe PDF | Request a copy |
Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.