Optimizing PhiNet architectures for the detection of urban sounds on low-end devices
- 1. Digis Center, Fondazione Bruno Kessler Trento, Italy
Description
Sound Event Detection (SED) pipelines identify and classify relevant events in audio streams. With typical applications in the smart city domain (e.g., crowd counting, alarm triggering), SED is an asset for municipalities and law enforcement agencies. Given the large size of the areas to be monitored and the amount of data generated by the IoT sensors, large models running on centralised servers are not suitable for real-time applications. Conversely, performing SED directly on pervasive embedded devices is very attractive in terms of energy consumption, bandwidth requirements and privacy preservation. In a previous manuscript, we proposed scalable backbones from the PhiNets architectures’ family for real-time sound event detection on microcontrollers. In this paper, we extend our analysis investigating how PhiNets’ scaling parameters affect the model performance in the SED task while searching for the best configuration given the computational constraints. Experimental analysis on UrbanSound8K shows that while only the total number of parameters matters when training the model from scratch (i.e., it is independent of the scaling parameter configuration), knowledge distillation is more effective with specific scaling configurations.
Files
Eusipco_2022_Brutti_etal.pdf
Files
(459.2 kB)
Name | Size | Download all |
---|---|---|
md5:78a7c6be1660c037a51a359e4b40cf95
|
459.2 kB | Preview Download |
Additional details
Related works
- Is published in
- Conference paper: https://ieeexplore.ieee.org/document/9909572 (URL)
- Is supplemented by
- Software: https://github.com/fpaissan/phinet_pl (URL)
- Dataset: https://urbansounddataset.weebly.com (URL)