Published June 14, 2022 | Version v1
Conference paper Open

Optimizing PhiNet architectures for the detection of urban sounds on low-end devices

  • 1. Digis Center, Fondazione Bruno Kessler Trento, Italy


Sound Event Detection (SED) pipelines identify and classify relevant events in audio streams. With typical applications in the smart city domain (e.g., crowd counting, alarm triggering), SED is an asset for municipalities and law enforcement agencies. Given the large size of the areas to be monitored and the amount of data generated by the IoT sensors, large models running on centralised servers are not suitable for real-time applications. Conversely, performing SED directly on pervasive embedded devices is very attractive in terms of energy consumption, bandwidth requirements and privacy preservation. In a previous manuscript, we proposed scalable backbones from the PhiNets architectures’ family for real-time sound event detection on microcontrollers. In this paper, we extend our analysis investigating how PhiNets’ scaling parameters affect the model performance in the SED task while searching for the best configuration given the computational constraints. Experimental analysis on UrbanSound8K shows that while only the total number of parameters matters when training the model from scratch (i.e., it is independent of the scaling parameter configuration), knowledge distillation is more effective with specific scaling configurations.



Files (459.2 kB)

Name Size Download all
459.2 kB Preview Download

Additional details

Related works

Is published in
Conference paper: (URL)
Is supplemented by
Software: (URL)
Dataset: (URL)


MARVEL – Multimodal Extreme Scale Data Analytics for Smart Cities Environments 957337
European Commission