Published October 15, 2025 | Version v2

Speech Removal Framework for Privacy-preserving Audio Recordings

Description

WASPAA 2025 Demo

Public dataset such as The Sounds of Home are being recorded in people's home to capture everyday soundscape. Such audio recordings from home environments provide valuable information for recognizing daily activities, monitoring health and wellbeing, and enabling smart home applications. They support the development of robust sound event detection systems under real-world conditions. However, in-home recordings contain crucial personal information in the form of speech signals. It is crucial to remove the personal information such as speech from domestic audio recordings  when publicly sharing the recorded datasets. This demonstration showcase real-time identification of personal information, in our case it is speech, using various AI models such as convolutional neural networks (PANNs, E-PANNs), Transformer model (AST), voice activity detection (VAD) models (Silero, WebRTC). Our focus is  two fold: (1) To design a speech removal system to identify and remove speech from the recorded audio in real-time.  (2) How well can AI models distinguish speech from non-speech audio? Our demonstration is simple, easy to use and a software-based GUI.

Files

WASPAA2025-Demo.pdf

Files (525.5 kB)

Name Size Download all
md5:0b936050cef064ac5ddbf7f5e9078df3
525.5 kB Preview Download

Additional details

Software

Repository URL
https://github.com/gbibbo/vad_demo
Programming language
Python
Development Status
Active