Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01g158bm124
Title: Making the Invisible Visible: Increasing Transparency In the Media via Word Embeddings from Fox News to Star Wars
Authors: Cheyette, Madeleine
Advisors: Fellbaum, Christiane
Department: Computer Science
Class Year: 2019
Abstract: Word embeddings are a Natural Language Processing technique that capture how words are used in large text documents. Because of their ability to reflect context of words in a document, word embeddings are able to show similarities between words and how such similarities might reflect societal bias. Using news articles from the time of the 2016 election, this paper examines how bias against LGBTQ+ groups and women in word2vec embeddings differs between liberal and conservative publications, and how bias changes after the election. We analyze these results with a quantitative bias metric, and find that overall, bias against LGBTQ+ groups is higher in conservative publications, and bias against women is slightly lower in conservative publications, but stereotypes appear across all publications. We explore such trends further by examining the embeddings’ profession associations (grouped by industry and salary level), personality trait associations, and familial associations to each identity group. After the election, we find that bias largely increases across all publications. To further analyze the negative and positive sentiment of personality traits associated with each marginalized group, we use the NLTK SentiWordNet library. We cross-analyze our results with NMF topic modeling, which reveals situations in the news articles in which the word embeddings might not be able to fully capture meaning. Finally, we examine embeddings generated from the Cornell Movie Dialogue Corpus and find that it reflects similar biases to the news embeddings, suggesting that the news we consume today is equally biased to fictional stories we’ve consumed since 1927.
URI: http://arks.princeton.edu/ark:/88435/dsp01g158bm124
Type of Material: Princeton University Senior Theses
Language: en
Appears in Collections:Computer Science, 1988-2020

Files in This Item:
File Description SizeFormat 
CHEYETTE-MADELEINE-THESIS.pdf2.18 MBAdobe PDF    Request a copy


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.