Word Embedding Measures of Bias in Automatic Summarization

Jiang, May

Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01zk51vk746

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Fellbaum, Christiane	-
dc.contributor.author	Jiang, May	-
dc.date.accessioned	2020-08-12T13:14:12Z	-
dc.date.available	2020-08-12T13:14:12Z	-
dc.date.created	2020-05-03	-
dc.date.issued	2020-08-12	-
dc.identifier.uri	http://arks.princeton.edu/ark:/88435/dsp01zk51vk746	-
dc.description.abstract	With the rapidly growing volume of textual data and the accompanying development of machine learning methods to learn from that data, the fact that such data are infused with human biases—and that such biases propagate to these models and through them into downstream applications—has become ever more important. Word embeddings, in particular, have been shown to inherit biases and have consequently been used for quantifying bias in text corpora. The objective of this work is to examine word embedding measures of bias, probing into the factors that affect their results and, in light of these characteristics, the reliability and applicability of these metrics. Investigating these measures in application, we find that factors such as length and gender ratios within a corpus can significantly influence or distort the bias measured. With this in mind, we analyze bias in summarization, studying how bias in a text propagates to its summary to gain insight into the bias carried in and from text, and find that controlled for length, summaries—the articles in concentrated form—are also more concentrated in measurable bias. Finally, we extend our analysis and leverage word embeddings to encapsulate broader aspects of bias and illuminate more nuances in text, introducing novel approaches to incorporate sentiment in bias measurement and to effectively detect and extract coherent personality stereotype clusters and their biases.	en_US
dc.format.mimetype	application/pdf	-
dc.language.iso	en	en_US
dc.title	Word Embedding Measures of Bias in Automatic Summarization	en_US
dc.title	Word Embedding Measures of Bias in Automatic Summarization	en_US
dc.title	TEXT	-
dc.title	Atkins_Thesis_Final_Signed.pdf	-
dc.title	Word Embedding Measures of Bias in Automatic Summarization	en_US
dc.type	Princeton University Senior Theses	-
pu.date.classyear	2020	en_US
pu.department	Computer Science	en_US
pu.pdf.coverpage	SeniorThesisCoverPage	-
pu.contributor.authorid	961089079	-
Appears in Collections:	Computer Science, 1988-2020

Files in This Item:

File	Description	Size	Format
JIANG-MAY-THESIS.pdf		666.66 kB	Adobe PDF	Request a copy

Show simple item record

Search

Browse