Published December 13, 2022 | Version v1
Conference paper Open

Wider or Deeper Neural Network Architecture for Acoustic Scene Classification with Mismatched Recording Devices

Authors/Creators

Description

In this paper, we present a robust and low complexity model for Acoustic Scene Classification (ASC), the task of identifying the scene of an audio recording. We firstly construct an ASC model in which a novel inception-residual-based network architecture is proposed to deal with the issue of mismatched recording devices. To further improve the model performance but still satisfy the low footprint, we apply two techniques of ensemble of multiple spectrograms and model compression to the proposed ASC model. By conducting extensive experiments on the benchmark DCASE 2020 Task 1A Development dataset, we achieve the best model performing an accuracy of 71.3% and a low complexity of 0.5 Million (M) trainable parameters, which is very competitive to the state-of-the-art systems and potential for real-life applications on edge devices.

Files

3551626.3564962.pdf

Files (505.8 kB)

Name Size Download all
md5:0cb44610e0d4ddbfd07817a4f9b856d2
505.8 kB Preview Download

Additional details

Related works

Cites
Poster: 10.47839/ijc.21.2.2595 (DOI)

Dates

Accepted
2022-12-13