Published May 20, 2020 | Version v1
Conference paper Open

Source domain data selection for improved transfer learning targeting dysarthric speech recognition

  • 1. The University of Sheffield

Description

This paper presents an improved transfer learning framework applied to robust personalised speech recognition models for speakers with dysarthria. As the baseline of transfer learning, a state-of-theart CNN-TDNN-F ASR acoustic model trained solely on source domain data is adapted onto the target domain via neural network weight adaptation with the limited available data from target dysarthric speakers. Results show that linear weights in neural layers play the most important role for an improved modelling of dysarthric speech evaluated using UASpeech corpus, achieving averaged 11.6% and 7.6% relative recognition improvement in comparison to the conventional speaker-dependent training and data combination, respectively. To further improve the transferability towards target domain, we propose an utterance-based data selection of the source domain data based on the entropy of posterior probability, which is analysed to statistically obey a Gaussian distribution. Compared to a speaker-based data selection via dysarthria similarity measure, this allows for a more accurate selection of the potentially beneficial source domain data for either increasing the target domain training pool or constructing an intermediate domain for incremental transfer learning, resulting in a further absolute recognition performance improvement of nearly 2% added to transfer learning baseline for speakers with moderate to severe dysarthria

Files

Source_Domain_Data_Selection_for_Improved_Transfer_Learning_Targeting_Dysarthric_Speech_Recognition.pdf

Additional details

Funding

TAPAS – Training Network on Automatic Processing of PAthological Speech 766287
European Commission