Parallel Representation Learning for the Classification of Pathological Speech: Studies on Parkinson's Disease and Cleft Lip and Palate

Juan Camilo Vasquez-Correa; Tomas Arias-Vergara; Maria Schuster; Juan Rafael Orozco-Arroyave; Elmar Nöth

doi:10.1016/j.specom.2020.07.005

Published July 29, 2020 | Version v1

Journal article Open

Parallel Representation Learning for the Classification of Pathological Speech: Studies on Parkinson's Disease and Cleft Lip and Palate

1. Friedrich Alexander University Erlangen-Nuremberg
2. Ludwig Maximilians University Munich
3. University of Antioquia

Speech signals may contain different paralinguistic aspects such as the presence of pathologies that affect the proper communication capabilities of a speaker. Those speech disorders have different origin depending on the type of the disease. For instance, diseases with morphological origin such as cleft lip and palate that causes hypernasality, or with neurodegenerative origin such as Parkinson’s disease that generates hypokinetic dysarthria on the patients. Automatic assessment of pathological speech allows to support the diagnosis and/or the evaluation of the disease severity. Conventional methods are based on the manually applied assessment of single features such as jitter, shimmer, or formant frequencies that may not completely model all of the phenomena that appear due to the disease. This paper introduces a novel strategy based on unsupervised representation learning for automatic detection of pathological speech. The proposed approach is based on the use of recurrent and convolutional autoencoders trained to extract informative features to characterize the presence of pathologies in speech. A novel feature set based on the reconstruction error of the autoencoders is also proposed. The performance of the introduced models is evaluated classifying pathological speech signals recorded from people suffering from Parkinson’s disease, and children with cleft lip and palate. All participants from this study were Spanish native speakers. The proposed models are accurate to classify the speech signals of both kinds of diseases, with an accuracy of up to 97% for cleft lip and palate, and up to 84% for the case of Parkinson’s disease. We also show that the reconstruction error from the autoencoders in different frequency regions contain information related to specific speech symptoms of both diseases.

Files

Speechcom_Parallel_representation_learning.pdf

Files (804.7 kB)

Name	Size	Download all
Speechcom_Parallel_representation_learning.pdf md5:0d82f3ca46b31e095b24bdf3bb8f0ffb	804.7 kB	Preview Download

Additional details

European Commission
TAPAS - Training Network on Automatic Processing of PAthological Speech 766287

	All versions	This version
Views	89	89
Downloads	377	374
Data volume	306.6 MB	304.2 MB

Parallel Representation Learning for the Classification of Pathological Speech: Studies on Parkinson's Disease and Cleft Lip and Palate

Authors/Creators

Description

Files

Speechcom_Parallel_representation_learning.pdf

Files (804.7 kB)

Additional details

Funding