Acoustic Scene Classification using Convolutional Neural Networks on Multivariate Audio

doi:10.5281/zenodo.3445618

Published September 19, 2019 | Version v1

Thesis Open

Acoustic Scene Classification using Convolutional Neural Networks on Multivariate Audio

Jones, Marc Kaivon¹

1. Universitat Pompeu Fabra

Supervisors:

1. Universitat Pompeu Fabra

Research has shown the efficacy of using convolutional neural networks (CNN) with audio spectrograms in machine listening tasks such as acoustic scene classification (ASC). There is, however, a knowledge gap when it comes to standardizing preprocessing practices for this form of ASC. Researchers using these methods have been moving forward in relative darkness about how to best represent their audio data for consumption by a CNN, often relying on transfer learning from adjacent machine listening tasks. This work explores the relationship of frequency limens and channel depth on ASC accuracy with CNNs of three different varieties: generic, deep, and wide. Results show that variability in the representation of spectral audio information plays a crucial role in classifier performance. Classification accuracy improved when using multi-channel representations of audio data over a single channel alternative. Classification accuracy also decreased when the representative spectra contained less frequency information, albeit to a lesser degree. This pattern was nearly consistent across each of the proposed CNN architectures. These findings have direct implications for several academic and industrial machine listening applications. In the academic realm, they work towards codifying audio data preprocessing practices and network architectural decisions. In industry, the results open the door for exploring the usage of substandard microphones in technologies that employ machine listening such as commodity hardware.

Files

mjones_smcthesis.pdf

Files (12.1 MB)

Name	Size	Download all
mjones_smcthesis.pdf md5:15a873ecc981954e0fafe28bfa149f2c	12.1 MB	Preview Download

	All versions	This version
Views	182	179
Downloads	134	131
Data volume	1.9 GB	1.8 GB

Acoustic Scene Classification using Convolutional Neural Networks on Multivariate Audio

Creators

Contributors

Supervisors:

Description

Files

mjones_smcthesis.pdf

Files (12.1 MB)