Speech Removal Framework for Privacy-preserving Audio Recordings
Authors/Creators
Description
WASPAA 2025 Demo
Public dataset such as The Sounds of Home are being recorded in people's home to capture everyday soundscape. Such audio recordings from home environments provide valuable information for recognizing daily activities, monitoring health and wellbeing, and enabling smart home applications. They support the development of robust sound event detection systems under real-world conditions. However, in-home recordings contain crucial personal information in the form of speech signals. It is crucial to remove the personal information such as speech from domestic audio recordings when publicly sharing the recorded datasets. This demonstration showcase real-time identification of personal information, in our case it is speech, using various AI models such as convolutional neural networks (PANNs, E-PANNs), Transformer model (AST), voice activity detection (VAD) models (Silero, WebRTC). Our focus is two fold: (1) To design a speech removal system to identify and remove speech from the recorded audio in real-time. (2) How well can AI models distinguish speech from non-speech audio? Our demonstration is simple, easy to use and a software-based GUI.
Files
WASPAA2025-Demo.pdf
Files
(525.5 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:0b936050cef064ac5ddbf7f5e9078df3
|
525.5 kB | Preview Download |
Additional details
Software
- Repository URL
- https://github.com/gbibbo/vad_demo
- Programming language
- Python
- Development Status
- Active