Published April 4, 2021 | Version v1
Report Open

Pretrained Models for An Investigation of the Relation Between Grapheme Embeddings and Pronunciation for Tacotron-based Systems

  • 1. Univ Rennes, CNRS, IRISA, France
  • 2. National Institute of Informatics, Japan

Description

Pretrained models for "An Investigation of the Relation Between Grapheme Embeddings and Pronunciation for Tacotron-based Systems," Antoine Perquin, Erica Cooper, Junichi Yamagishi.
https://arxiv.org/abs/2010.10694

This is a derived work of the SIWIS corpus:
Yamagishi, Junichi; Honnet, Pierre-Edouard; Garner, Philip; Lazaridis, Alexandros. (2017). The SIWIS French Speech Synthesis Database, 2016 [dataset]. University of Edinburgh. School of Informatics. The Centre for Speech Technology Research. https://doi.org/10.7488/ds/1705.

End-to-end models, particularly Tacotron-based ones, are currently a popular solution for text-to-speech synthesis. They allow the production of high-quality synthesized speech with little to no text preprocessing. Indeed, they can be trained using either graphemes or phonemes as input directly. However, in the case of grapheme inputs, little is known concerning the relation between the underlying representations learned by the model and word pronunciations. This work investigates this relation in the case of a Tacotron model trained on French graphemes. Our analysis shows that grapheme embeddings are related to phoneme information despite no such information being present during training. Thanks to this property, we show that grapheme embeddings learned by Tacotron models can be useful for tasks such as grapheme-to-phoneme conversion and control of the pronunciation in synthetic speech.

Files

french-tacotron-models.zip

Files (1.1 GB)

Name Size Download all
md5:8f37e55acdeca8d84480d1c35a462a02
1.1 GB Preview Download

Additional details

References

  • Perquin, Antoine et al. (2021.) "An Investigation of the Relation Between Grapheme Embeddings and Pronunciation for Tacotron-based Systems"
  • Yamagishi, Junichi et al. (2017). "The SIWIS French Speech Synthesis Database."