Wider or Deeper Neural Network Architecture for Acoustic Scene Classification with Mismatched Recording Devices

Pham, Lam

doi:10.1145/3551626.3564962

Published December 13, 2022 | Version v1

Conference paper Open

Wider or Deeper Neural Network Architecture for Acoustic Scene Classification with Mismatched Recording Devices

Pham, Lam

In this paper, we present a robust and low complexity model for Acoustic Scene Classification (ASC), the task of identifying the scene of an audio recording. We firstly construct an ASC model in which a novel inception-residual-based network architecture is proposed to deal with the issue of mismatched recording devices. To further improve the model performance but still satisfy the low footprint, we apply two techniques of ensemble of multiple spectrograms and model compression to the proposed ASC model. By conducting extensive experiments on the benchmark DCASE 2020 Task 1A Development dataset, we achieve the best model performing an accuracy of 71.3% and a low complexity of 0.5 Million (M) trainable parameters, which is very competitive to the state-of-the-art systems and potential for real-life applications on edge devices.

Files

3551626.3564962.pdf

Files (505.8 kB)

Name	Size	Download all
3551626.3564962.pdf md5:0cb44610e0d4ddbfd07817a4f9b856d2	505.8 kB	Preview Download

Additional details

Cites: Poster: 10.47839/ijc.21.2.2595 (DOI)

Accepted: 2022-12-13

	All versions	This version
Views	24	24
Downloads	27	27
Data volume	15.2 MB	15.2 MB

3551626.3564962.pdf

Files (505.8 kB)

Related works

Dates

Wider or Deeper Neural Network Architecture for Acoustic Scene Classification with Mismatched Recording Devices

Authors/Creators

Description

Files

3551626.3564962.pdf

Files (505.8 kB)

Additional details

Related works

Dates