Please use this identifier to cite or link to this item:
http://arks.princeton.edu/ark:/88435/dsp01tx31qm55v
Title: | Privacy Implications of Not-So-Hidden Comments in arXiv Files and Analysis of Online Privacy Policies |
Authors: | Li, Frank |
Advisors: | Narayanan, Arvind |
Department: | Electrical Engineering |
Certificate Program: | Applications of Computing Program |
Class Year: | 2019 |
Abstract: | Internet users can accidentally expose their own private information in a myriad of ways. This paper describes our approach to a large-scale measurement study on one case of online privacy leakage wherein users upload files for publication and sharing, files that can contain users’ private information hidden within them. We analyze comments in TeX source files of arXiv publications using various natural language processing techniques to identify specific attributes of comments that may represent privacy violations. We also perform near-duplicate detection and clustering on a large data set of privacy policy texts to understand how online privacy policy is communicated to users. We find that arXiv publications contain many interesting comments despite the ease with which authors can strip out all comments. We find that many privacy policy texts are duplicates or near-duplicates of one another. |
URI: | http://arks.princeton.edu/ark:/88435/dsp01tx31qm55v |
Type of Material: | Princeton University Senior Theses |
Language: | en |
Appears in Collections: | Electrical Engineering, 1932-2020 |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
LI-FRANK-THESIS.pdf | 717.91 kB | Adobe PDF | Request a copy |
Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.