Published July 3, 2024 | Version v1
Dataset Open

Target Speech Extraction Dataset for Knowledge Boosting (Part 2)

Contributors

Contact person:

Data curator:

  • 1. ROR icon University of Washington

Description

Part 2 of the Target Speech Extraction Dataset as described in Knowledge boosting during low-latency inference (Interspeech 2024)

Abstract: Models for low-latency, streaming applications could benefit from the knowledge capacity of larger models, but edge devices cannot run these models due to resource constraints. A possible solution is to transfer hints during inference from a large model running remotely to a small model running on-device. However, this incurs a communication delay that breaks real-time requirements and does not guarantee that both models will operate on the same data at the same time. We propose knowledge boosting, a novel  technique that allows a large model to operate on time-delayed input during inference, while still boosting small model performance. Using a  streaming neural network that processes 8 ms chunks, we evaluate different speech separation and enhancement tasks with communication delays of up to six chunks or 48 ms. Our results show larger gains where the performance gap between the small and large models is wide, demonstrating a promising method for large-small model collaboration for low-latency applications. 

Files

Files (38.9 GB)

Name Size Download all
md5:326c7e19e220b0ed7d44f42d6718a2b2
3.2 GB Download
md5:1f524988213dde4141279261ba9de7f9
32.4 GB Download
md5:e01742d90d099a6e4137e0c268f25308
3.2 GB Download

Additional details

Related works

Is supplement to
Presentation: https://knowledgeboosting.cs.washington.edu/ (URL)