Source domain data selection for improved transfer learning targeting dysarthric speech recognition

Xiong, Feifei; Baker, Jon; Yue, Zhengjun; Christensen, Heidi

doi:10.5281/zenodo.7180780

Published May 20, 2020 | Version v1

Conference paper Open

Source domain data selection for improved transfer learning targeting dysarthric speech recognition

1. The University of Sheffield

This paper presents an improved transfer learning framework applied to robust personalised speech recognition models for speakers with dysarthria. As the baseline of transfer learning, a state-of-theart CNN-TDNN-F ASR acoustic model trained solely on source domain data is adapted onto the target domain via neural network weight adaptation with the limited available data from target dysarthric speakers. Results show that linear weights in neural layers play the most important role for an improved modelling of dysarthric speech evaluated using UASpeech corpus, achieving averaged 11.6% and 7.6% relative recognition improvement in comparison to the conventional speaker-dependent training and data combination, respectively. To further improve the transferability towards target domain, we propose an utterance-based data selection of the source domain data based on the entropy of posterior probability, which is analysed to statistically obey a Gaussian distribution. Compared to a speaker-based data selection via dysarthria similarity measure, this allows for a more accurate selection of the potentially beneficial source domain data for either increasing the target domain training pool or constructing an intermediate domain for incremental transfer learning, resulting in a further absolute recognition performance improvement of nearly 2% added to transfer learning baseline for speakers with moderate to severe dysarthria

Files

Source_Domain_Data_Selection_for_Improved_Transfer_Learning_Targeting_Dysarthric_Speech_Recognition.pdf

Files (957.5 kB)

Name	Size	Download all
Source_Domain_Data_Selection_for_Improved_Transfer_Learning_Targeting_Dysarthric_Speech_Recognition.pdf md5:6ccad9e99bb3582737b9e5d1070c80ec	957.5 kB	Preview Download

Additional details

European Commission
TAPAS – Training Network on Automatic Processing of PAthological Speech 766287

	All versions	This version
Views	48	48
Downloads	163	161
Data volume	156.1 MB	154.2 MB

Source domain data selection for improved transfer learning targeting dysarthric speech recognition

Creators

Description

Files

Source_Domain_Data_Selection_for_Improved_Transfer_Learning_Targeting_Dysarthric_Speech_Recognition.pdf

Files (957.5 kB)

Additional details

Funding