ORIGINAL

Yoon, Phillip

Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp012514np477

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Russakovsky, Olga	-
dc.contributor.author	Yoon, Phillip	-
dc.date.accessioned	2020-08-12T16:08:14Z	-
dc.date.available	2020-08-12T16:08:14Z	-
dc.date.created	2020-05-02	-
dc.date.issued	2020-08-12	-
dc.identifier.uri	http://arks.princeton.edu/ark:/88435/dsp012514np477	-
dc.description.abstract	This paper details the design of a self-supervised model for sound separation and localization by capitalizing upon the natural correspondence between the audio and visual modalities of videos. Because of the temporal alignment between auditory and visual components, our deep learning-based approach allows us to leverage this synchronization in jointly fusing these signals to simultaneously learn the tasks of separation and localization. For every pixel region in a video, a binary mask is predicted and then overlaid on a spectrogram representation of the input audio to estimate the inferred sound coming from that region. To train our neural network, we employ a mix-and-separate framework to synthetically create training data from our dataset of stabilized videos. High performance was achieved from our joint audio-visual model, asserting the success of our proposed architecture in separating and localizing sound in videos.	en_US
dc.format.mimetype	application/pdf	-
dc.language.iso	en	en_US
dc.title	ORIGINAL	en_US
dc.title	Improving Sound Separation and Localization Using Audio-Visual Scene Analysis	en_US
dc.title	ORIGINAL	en_US
dc.type	Princeton University Senior Theses	-
pu.date.classyear	2020	en_US
pu.department	Computer Science	en_US
pu.pdf.coverpage	SeniorThesisCoverPage	-
pu.contributor.authorid	920058657	-
Appears in Collections:	Computer Science, 1988-2020

Files in This Item:

File	Description	Size	Format
YOON-PHILLIP-THESIS.pdf		2.33 MB	Adobe PDF	Request a copy

Show simple item record

Search

Browse