Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp010c483n34m
Title: Learning cross-lingual word embeddings for sentiment analysis of microblog posts
Learning cross-lingual word embeddings for sentiment analysis of microblog posts
Learning cross-lingual word embeddings for sentiment analysis of microblog posts
Learning cross-lingual word embeddings for sentiment analysis of microblog posts
Learning cross-lingual word embeddings for sentiment analysis of microblog posts
Authors: Zou, Anne
Advisors: Fellbaum, Christiane
Department: Computer Science
Class Year: 2020
Abstract: With the growth in social media platforms, microblogs have become an important data source for sentiment analysis. Because sentiment analysis systems depend on quality annotated corpora, transfer learning techniques can be valuable. To repurpose models between different languages, current methods employ parallel resources, such as machine translation or bilingual sentiment lexicons. However, these resources are quite scarce. In this paper, we use monolingual resources and unsupervised techniques to induce cross-lingual task-specific word embeddings for the tasks of emoji prediction and sentiment classification of microblog posts from Twitter and Sina Weibo (commonly shortened to Weibo). Unlike the majority of related multilingual work, we do not use English as the source language. We leverage an enormous Mandarin Chinese language data set to train a monolingual model for emoji prediction, extract its trained embedding layer, and adapt it to support the English language. We apply the adapted word embeddings to new cross-lingual English models in the same task as well as a related task, to gauge their transfer potential. Our cross-lingual English models are competitive with monolingual models, achieving 11.8% accuracy at emoji prediction (out of 64 emojis) and 73.2% at binary sentiment classification. Despite the linguistic distance between Chinese and English, our results show strong transfer performance, supporting the assumption that the languages' embedding spaces are similar in topology. We use this assumption to estimate emotional meanings to unique, Weibo-specific emojis without straightforward English translations. Our analyses also reveal that increased diversity in emoji labels in the Chinese emoji prediction pre-training resulted in improved sentiment classification.
URI: http://arks.princeton.edu/ark:/88435/dsp010c483n34m
Type of Material: Princeton University Senior Theses
Language: en
Appears in Collections:Computer Science, 1988-2020

Files in This Item:
File Description SizeFormat 
ZOU-ANNE-THESIS.pdf3.21 MBAdobe PDF    Request a copy


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.