Please use this identifier to cite or link to this item:
http://arks.princeton.edu/ark:/88435/dsp01pz50gz72c
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | Narayanan, Arvind | - |
dc.contributor.author | Abdulhusein, Neamah | - |
dc.date.accessioned | 2017-07-20T14:19:24Z | - |
dc.date.available | 2017-07-20T14:19:24Z | - |
dc.date.created | 2017-05-06 | - |
dc.date.issued | 2017-5-6 | - |
dc.identifier.uri | http://arks.princeton.edu/ark:/88435/dsp01pz50gz72c | - |
dc.description.abstract | Recent analyses on gender and other categories of bias in word embeddings have focused their efforts on developing algorithmic methods to remove these unwanted biases from the vector space models. However, little work has been done on drawing connections between the biases present in word embeddings and how they can inform us on the state of the environment in which the language is spoken. In this work, we explore the question of whether we can extract empirical information on the world around us from the vector space models of the languages we speak. We explore this question specifically with regards to gender bias. We conduct an intra-lingual experiment to determine whether gender associations of sports words in a language model L can predict the percentage of female participants from countries that speak L in those sports in the Summer Olympics. We conduct two inter-lingual experiments to determine whether the gender score of a language can be used to predict country-specific gender statistics, such as the UN Gender Inequality Index (GII). Our intra-lingual experiments in English, Spanish and Portuguese show highly significant results. Our inter-lingual experiment on predicting UN GII also shows significant potential for using the proposed metrics to predict country and language specific empirical statistics. | en_US |
dc.language.iso | en_US | en_US |
dc.title | Extracting Empirical Statistics on Gender Bias From Word Embeddings: A Multilingual Analysis | en_US |
dc.type | Princeton University Senior Theses | - |
pu.date.classyear | 2017 | en_US |
pu.department | Computer Science | en_US |
pu.pdf.coverpage | SeniorThesisCoverPage | - |
pu.contributor.authorid | 960889762 | - |
pu.contributor.advisorid | 960831988 | - |
pu.certificate | Center for Statistics and Machine Learning | en_US |
Appears in Collections: | Computer Science, 1988-2020 |
Files in This Item:
File | Size | Format | |
---|---|---|---|
neamah_written_final_report_bound.pdf | 995.32 kB | Adobe PDF | Request a copy |
Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.