Multimodal and multi-output deep learning architectures for the automatic assessment of voice quality using the GRB scale

Julián David Arias-Londoño; Jorge Andrés Gómez-García; Juan Ignacio Godino-Llorente

doi:10.1109/JSTSP.2019.2956410

Published November 28, 2019 | Version v1

Journal article Open

Multimodal and multi-output deep learning architectures for the automatic assessment of voice quality using the GRB scale

1. Universidad de Antioquia, Medellín, Colombia
2. Universidad Politécnica de Madrid, Spain

This paper addresses the automatic assessment of voice quality according to the GRB scale, based on the use of a variety of deep learning architectures for prediction purposes. The proposed architectures are multimodal, because they employ multiples sources of information; and also multi-output, because they simultaneously predict all the traits of the GRB scale. A feature engineering approach is followed, based on the use of deep neural networks and a set of well-established features such as MFCC, perturbation and complexity characteristics. Likewise, a representation learning is considered, using convolutional neural networks feed on modulation spectra extracted from voices. Finally, diverse loss functions are also investigated, including two surrogate ordinal classification, a conventional weighed categorical cross-entropy, and a mean square error function. Experiments are carried out in a dataset containing registers of the sustained phonation of three vowels. The best deep learning architecture provides a relative performance improvement of 6.25% for G, 14.1% for R and 18.1% for B, in comparison with recently published results using the same dataset.

Files

JSTSP-PrePrint.pdf

Files (367.8 kB)

Name	Size	Download all
JSTSP-PrePrint.pdf md5:5fcde1740b220d4f9a9f0e6aa1291598	367.8 kB	Preview Download

	All versions	This version
Views	59	59
Downloads	281	281
Data volume	108.5 MB	108.5 MB

Multimodal and multi-output deep learning architectures for the automatic assessment of voice quality using the GRB scale

Creators

Description

Files

JSTSP-PrePrint.pdf

Files (367.8 kB)