Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01g158bm124
Full metadata record
DC FieldValueLanguage
dc.contributor.advisorFellbaum, Christiane-
dc.contributor.authorCheyette, Madeleine-
dc.date.accessioned2019-07-24T19:54:27Z-
dc.date.available2019-07-24T19:54:27Z-
dc.date.created2019-05-03-
dc.date.issued2019-07-24-
dc.identifier.urihttp://arks.princeton.edu/ark:/88435/dsp01g158bm124-
dc.description.abstractWord embeddings are a Natural Language Processing technique that capture how words are used in large text documents. Because of their ability to reflect context of words in a document, word embeddings are able to show similarities between words and how such similarities might reflect societal bias. Using news articles from the time of the 2016 election, this paper examines how bias against LGBTQ+ groups and women in word2vec embeddings differs between liberal and conservative publications, and how bias changes after the election. We analyze these results with a quantitative bias metric, and find that overall, bias against LGBTQ+ groups is higher in conservative publications, and bias against women is slightly lower in conservative publications, but stereotypes appear across all publications. We explore such trends further by examining the embeddings’ profession associations (grouped by industry and salary level), personality trait associations, and familial associations to each identity group. After the election, we find that bias largely increases across all publications. To further analyze the negative and positive sentiment of personality traits associated with each marginalized group, we use the NLTK SentiWordNet library. We cross-analyze our results with NMF topic modeling, which reveals situations in the news articles in which the word embeddings might not be able to fully capture meaning. Finally, we examine embeddings generated from the Cornell Movie Dialogue Corpus and find that it reflects similar biases to the news embeddings, suggesting that the news we consume today is equally biased to fictional stories we’ve consumed since 1927.en_US
dc.format.mimetypeapplication/pdf-
dc.language.isoenen_US
dc.titleMaking the Invisible Visible: Increasing Transparency In the Media via Word Embeddings from Fox News to Star Warsen_US
dc.typePrinceton University Senior Theses-
pu.date.classyear2019en_US
pu.departmentComputer Scienceen_US
pu.pdf.coverpageSeniorThesisCoverPage-
pu.contributor.authorid961160181-
Appears in Collections:Computer Science, 1988-2020

Files in This Item:
File Description SizeFormat 
CHEYETTE-MADELEINE-THESIS.pdf2.18 MBAdobe PDF    Request a copy


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.