Dysarthric Speech Recognition From Raw Waveform with Parametric CNNs
Authors/Creators
- 1. King's College London
- 2. The University of Sheffield
Description
Raw waveform acoustic modelling has recently received increasing attention. Compared with the task-blind hand-crafted features which may discard useful information, representations directly learned from the raw waveform are task-specific and potentially include all task-relevant information. In the context of automatic dysarthric speech recognition, raw waveform acoustic modelling is under-explored owing to data scarcity. Parametric CNNs can compensate for this problem owing to having notably fewer parameters and requiring less training data in comparison with conventional non-parametric CNNs. In this paper, we explore the usefulness of raw waveform acoustic modelling using various parametric CNNs for ADSR. Additionally, we investigate the properties of the learned filters and monitor the training dynamics of various models. Furthermore, we study the effectiveness of data augmentation and multi-stream acoustic modelling through combining the non-parametric and parametric CNNs fed by hand-crafted and raw waveform features. Experimental results on the widelyused TORGO dysarthric database show that the parametric CNNs significantly outperform the non-parametric CNNs on dysarthric speech (up to 2.7% and 1.8% absolute error reduction), reaching up to 35.9% and 11.9% WERs for dysarthric and typical speech respectively. Multi-streaming acoustic modelling further improves the performance resulting in up to 33.2%and 10.3% WERs for dysarthric and typical speech, respectively.
Files
INTERSPEECH_2022 (12).pdf
Files
(351.5 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:d9374db0d4da549692a748e8cc196d2f
|
351.5 kB | Preview Download |
Additional details
Funding
- UK Research and Innovation
- SpeechWave EP/R012180/1