Two-stage deep learning approach for speech enhancement and reconstruction in the frequency and time domains

Nossier, Soha; Wall, Julie; Moniri, Mansour; Glackin, Cornelius; Cannings, Nigel

doi:10.5281/zenodo.7017108

Published July 28, 2022 | Version preprint

Conference paper Open

Two-stage deep learning approach for speech enhancement and reconstruction in the frequency and time domains

1. University of East London
2. Intelligent Voice Ltd

Deep learning has recently shown promising improvement in the speech enhancement field, due to its effectiveness in eliminating noise. However, a drawback of the denoising process is the introduction of speech distortion, which negatively affects speech quality and intelligibility. In this work, we propose a deep convolutional denoising autoencoder-based speech enhancement network that is designed to have an encoder deeper than the decoder, to improve performance and decrease complexity. Furthermore, we present a two-stage learning approach, in which denoising is performed in the first frequency domain stage using magnitude spectrum as a training target; while, in the second stage, further denoising and speech reconstruction are performed in the time domain. Results show that our architecture achieves 0.22 improvement in the overall predicted mean opinion score (Covl) over state of the art speech enhancement architectures, using the Valentini dataset benchmark. Moreover, the architecture was trained using a larger dataset and tested using a mismatched test corpus, to achieve 0.7 and 6.35% improvement in Perceptual Evaluation of Speech Quality (PESQ) and Short Time Objective Intelligibility (STOI) scores, respectively, compared to the noisy speech.

Files

Nossier_Manuscript.pdf

Files (3.7 MB)

Name	Size	Download all
Nossier_Manuscript.pdf md5:91f70d8cb2800baa74c5b50742c982fa	3.7 MB	Preview Download

Additional details

European Commission
MENHIR – Mental health monitoring through interactive conversations 823907

Citations

Oops! Something went wrong while fetching results.

	All versions	This version
Views	43	43
Downloads	58	58
Data volume	236.1 MB	236.1 MB

Two-stage deep learning approach for speech enhancement and reconstruction in the frequency and time domains

Creators

Description

Files

Nossier_Manuscript.pdf

Files (3.7 MB)

Additional details

Funding