Deep Noise Suppression for Real Time Speech Enhancement in a Single Channel Wide Band Scenario
Creators
Description
Speech enhancement can be regarded as a dual task that addresses two important issues of degraded speech: Speech quality and speech intelligibility. Improved speech quality can reduce listener’s fatigue, whereas improved speech intelligibility can re-duce the listener’s e˙ort to understand and extract meaning from speech. This work is focused on speech quality in a real time context. Algorithms that improve speech quality are sometimes referred to as noise suppression algorithms, since they enhance quality by suppressing the background noise of the degraded speech. Improving state of the art noise suppression algorithms could lead to significant benefits to several applications such as video conferencing systems, phone calls or speech recognition systems. Real time capable algorithms are especially important for devices with a limited processing power and physical constraints that cannot make use of large architectures, such as hearing aids or wearables. This work uses a deep learning based approach to expand on two previously proposed architectures in the context of the Deep Noise Suppression Challenge carried out by Microsoft. This challenge has provided datasets and resources to teams of researchers with the common goal of fostering the research on the aforementioned topic. The outcome of this thesis can be divided into three main contributions: First, an extended comparison between six variants of the two selected models, considering performance, computational com-plexity and real time eÿciency analyses. Secondly, making available an open source implementation of one of the proposed architectures as well as a framework transla-tion of an existing implementation. Finally, proposed variants that outperform the previously defined models in terms of denoising performance, complexity and real time eÿciency.
Files
2021-Esteban-Gómez.pdf
Files
(2.5 MB)
Name | Size | Download all |
---|---|---|
md5:1839523207e32314226ecd2f0cf3ec6e
|
2.5 MB | Preview Download |