Please use this identifier to cite or link to this item:
http://arks.princeton.edu/ark:/88435/dsp01g158bk91v
Title: | A MACHINE LEARNING APPROACH TO PRIVACY IN AUDIO-ENABLED IoT DEVICES |
Authors: | Karuri, Vincent |
Advisors: | Jha, Niraj K. |
Department: | Electrical Engineering |
Class Year: | 2017 |
Abstract: | IoT (Internet of Things) devices have proliferated our everyday lives. Some estimate put the number at over 30 billion by 2020 with over 200 billion intermittent connections [1]. IoT devices are no longer limited to the absolutely tech savvy or the well-to-do populations. Rather, almost everyone today owns a smart device, be it a smartphone, camera, TV set or headphone. With these realization comes the problem of privacy in our homes, offices and leisure spots when we are surrounded by so many sensors recording and transmitting our information unbeknownst to us. While it may be difficult to solve this problem completely in the case where the devices do not belong to us, most people would like to feel in control of what their personal devices can or cannot do. In this thesis, we focus more on IoT devices that have recording capability (audioenabled). These devices can be generalized as those having microphones like smartphones, smart TVs and voice assistants e.g. Amazon Echo, Siri and Google Now. The challenge we hope to address is that recordings made by these devices may sometimes be private and inadvertently shared with the devices. The devices usually send such recorded data to their servers that store this sensitive information, again unbeknownst to the end user. This violation of privacy is a huge problem and will become even bigger as IoT devices become ubiquitous. We propose a system that can be implemented between the server and device that can solve the problem of sensitive data leakage. The system works by filtering out predefined blacklisted words from audio speech recordings before passing on the recordings to the IoT device application which would then send the information to its servers. The system takes advantage of robust audio feature extraction techniques and the use of machine learning algorithms to provide the best feature set to use in classifying words as blacklisted or whitelisted. The positively identified blacklisted words are then extracted or zeroed out from the speech signal. The system was tested on one word sentences where it gave accuracies of 89% when identifying blacklisted words and 96% when identifying whitelisted words in single-word sentences. The system also gave 87% accuracy in identifying blacklisted words and 88% accuracy in identifying whitelisted words in multi-word sentences. Such a system would be a good foundation for implementing user-controlled privacy in IoT devices |
URI: | http://arks.princeton.edu/ark:/88435/dsp01g158bk91v |
Type of Material: | Princeton University Senior Theses |
Language: | en_US |
Appears in Collections: | Electrical Engineering, 1932-2020 |
Files in This Item:
File | Size | Format | |
---|---|---|---|
senior_thesis_final_report.pdf | 693.21 kB | Adobe PDF | Request a copy |
Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.