Published November 8, 2018 | Version v1
Conference paper Open

Embedding Context-Dependent Variations of Prosodic Contours using Variational Encoding for Decomposing the Structure of Speech Prosody

  • 1. Faculty of Electrical Engineering and Information Technologies, University of Ss Cyril and Methodius - Skopje
  • 2. Univ. Grenoble-Alpes, CNRS, Grenoble-INP, GIPSA-lab
  • 3. Department of Speech, Hearing and Phonetic Sciences, University College London
  • 4. Idiap Research Institute

Description

The quest for comprehensive generative models of intonation that link linguistic and paralinguistic functions to prosodic forms has been a longstanding challenge of speech communication research. Traditional intonation models have given way to the overwhelming performance of deep learning (DL) techniques for training general purpose end-to-end mappings using millions of tunable parameters. The shift towards black box machine learning models has nonetheless posed the reverse problem – a compelling need to discover knowledge, to explain, visualise and interpret. Our work bridges between a comprehensive generative model of intonation and state-of-the-art DL techniques. We build upon the modelling paradigm of the Superposition of Functional Contours (SFC) model and propose a Variational Prosody Model (VPM) that uses a network of variational contour generators to capture the context-sensitive variation of the constituent elementary prosodic contours. We show that the VPM can give insight into the intrinsic variability of these prosodic prototypes through learning a meaningful prosodic latent space representation structure. We also show that the VPM is able to capture prosodic phenomena that have multiple dimensions of context based variability. Since it is based on the principle of superposition, the VPM does not necessitate the use of specially crafted corpora for the analysis, opening up the possibilities of using big data for prosody analysis.

Files

gerazov18embedding_zenodo.pdf

Files (457.4 kB)

Name Size Download all
md5:362bbd247f27d44e85dba7f7b90503d3
457.4 kB Preview Download

Additional details

Funding

ProsoDeep – Deep understanding and modelling of the hierarchical structure of Prosody 745802
European Commission