Please use this identifier to cite or link to this item:
http://arks.princeton.edu/ark:/88435/dsp01ht24wn053
Title: | Transfer Learning for Music Genre Classification with Sparse Convolutional Autoencoders |
Authors: | Tam, Howard |
Advisors: | Kung, Sun-Yuan |
Department: | Electrical Engineering |
Class Year: | 2017 |
Abstract: | Most research on music genre classification today use the GTZAN dataset as a result of few large-scale datasets that are publicly available. At the same time, deep learning techniques are rapidly becoming more popular to meet the expanding demands of music streaming services for greater efficiency and accuracy of recommender systems, which requires a large amount of training data to learn meaningful architectures. Inspired by the recent effectiveness of transfer learning in computer vision tasks, this paper investigates the feasibility of adopting knowl- edge transfer from learning representations in an unsupervised manner from moderately-sized datasets such as the CAL500 dataset and the recently released Free Music Archive (FMA) dataset, as a form of pre-training for a target classification task on small datasets such as GTZAN. We used Scattering Wavelet Transform (SWT) to generate our feature vectors con- sisting of 2 orders of modulation spectrum coefficients, which represent original waveforms better than Mel-Frequency Cepstrum Coefficients (MFCC) as they retain time-varying struc- tures in the signal. Our proposed implementation uses sparse convolutional auto-encoders for pre-training deep convolutional neural networks. We have shown that a hybrid approach of freezing the first layer of the transferred parameters and fine-tuning the rest of the neural net- work to the target task consistently resulted in the best classification performance. |
URI: | http://arks.princeton.edu/ark:/88435/dsp01ht24wn053 |
Type of Material: | Princeton University Senior Theses |
Language: | en_US |
Appears in Collections: | Electrical Engineering, 1932-2020 |
Files in This Item:
File | Size | Format | |
---|---|---|---|
thesis.pdf | 1.2 MB | Adobe PDF | Request a copy |
Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.