Algorithms for Data Normalization
with Applications to Stop and Frisk

Fillmore, Mark

Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01wp988n13p

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Walker, David	-
dc.contributor.author	Fillmore, Mark	-
dc.date.accessioned	2015-06-26T13:44:37Z	-
dc.date.available	2015-06-26T13:44:37Z	-
dc.date.created	2015-04-30	-
dc.date.issued	2015-06-26	-
dc.identifier.uri	http://arks.princeton.edu/ark:/88435/dsp01wp988n13p	-
dc.description.abstract	Data modeling is an already difficult task that is further exacerbated by the errors of data entry. Inconsistencies in large quantities of data can make it difficult to perform any kind of automated analyses. We motivate our investigation into improved data cleaning methods by revealing disastrous non-uniformity in data related to the controversial Stop and Frisk policy as implemented by the NYPD. These inconsistencies help guide our construction of workflow F, which consults multiple similarity measurements in order to dictate proper transformations of non-uniform data into standardized values. F increases the volume of non-standardized data that is correctly transformed by 887% in comparison to common existing methods, such as the Levenshtein distance. We conclude by presenting additional pathways for improvement and describing how to most effectively apply workflow F as part of an interactive tool.	en_US
dc.format.extent	44 pages	en_US
dc.language.iso	en_US	en_US
dc.title	Algorithms for Data Normalization with Applications to Stop and Frisk	en_US
dc.type	Princeton University Senior Theses	-
pu.date.classyear	2015	en_US
pu.department	Computer Science	en_US
pu.pdf.coverpage	SeniorThesisCoverPage	-
Appears in Collections:	Computer Science, 1988-2020

Files in This Item:

File	Size	Format
PUTheses2015-Fillmore_Mark.pdf	858.51 kB	Adobe PDF	Request a copy

Show simple item record

Search

Browse