Wider or Deeper Neural Network Architecture for Acoustic Scene Classification with Mismatched Recording Devices
Authors/Creators
Description
In this paper, we present a robust and low complexity model for Acoustic Scene Classification (ASC), the task of identifying the scene of an audio recording. We firstly construct an ASC model in which a novel inception-residual-based network architecture is proposed to deal with the issue of mismatched recording devices. To further improve the model performance but still satisfy the low footprint, we apply two techniques of ensemble of multiple spectrograms and model compression to the proposed ASC model. By conducting extensive experiments on the benchmark DCASE 2020 Task 1A Development dataset, we achieve the best model performing an accuracy of 71.3% and a low complexity of 0.5 Million (M) trainable parameters, which is very competitive to the state-of-the-art systems and potential for real-life applications on edge devices.
Files
3551626.3564962.pdf
Files
(505.8 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:0cb44610e0d4ddbfd07817a4f9b856d2
|
505.8 kB | Preview Download |
Additional details
Related works
- Cites
- Poster: 10.47839/ijc.21.2.2595 (DOI)
Dates
- Accepted
-
2022-12-13