Dysarthric Speech Recognition From Raw Waveform with Parametric CNNs

Yue, Zhengjun; Loweimi, Erfan; Christensen, Heidi; Barker, Jon; Cvetkovic

doi:10.5281/zenodo.7180827

Published September 18, 2022 | Version v1

Conference paper Open

Dysarthric Speech Recognition From Raw Waveform with Parametric CNNs

1. King's College London
2. The University of Sheffield

Raw waveform acoustic modelling has recently received increasing attention. Compared with the task-blind hand-crafted features which may discard useful information, representations directly learned from the raw waveform are task-specific and potentially include all task-relevant information. In the context of automatic dysarthric speech recognition, raw waveform acoustic modelling is under-explored owing to data scarcity. Parametric CNNs can compensate for this problem owing to having notably fewer parameters and requiring less training data in comparison with conventional non-parametric CNNs. In this paper, we explore the usefulness of raw waveform acoustic modelling using various parametric CNNs for ADSR. Additionally, we investigate the properties of the learned filters and monitor the training dynamics of various models. Furthermore, we study the effectiveness of data augmentation and multi-stream acoustic modelling through combining the non-parametric and parametric CNNs fed by hand-crafted and raw waveform features. Experimental results on the widelyused TORGO dysarthric database show that the parametric CNNs significantly outperform the non-parametric CNNs on dysarthric speech (up to 2.7% and 1.8% absolute error reduction), reaching up to 35.9% and 11.9% WERs for dysarthric and typical speech respectively. Multi-streaming acoustic modelling further improves the performance resulting in up to 33.2%and 10.3% WERs for dysarthric and typical speech, respectively.

Files

INTERSPEECH_2022 (12).pdf

Files (351.5 kB)

Name	Size	Download all
INTERSPEECH_2022 (12).pdf md5:d9374db0d4da549692a748e8cc196d2f	351.5 kB	Preview Download

Additional details

UK Research and Innovation
SpeechWave EP/R012180/1

	All versions	This version
Views	161	160
Downloads	167	165
Data volume	59.1 MB	58.4 MB

Dysarthric Speech Recognition From Raw Waveform with Parametric CNNs

Authors/Creators

Description

Files

INTERSPEECH_2022 (12).pdf

Files (351.5 kB)

Additional details

Funding