Published November 10, 2023 | Version 0.0.1
Model Open

LJ Speech English TTS

  • 1. ROR icon Chemnitz University of Technology

Description

This upload contains a TTS model which was trained on the LJ Speech dataset using these transcriptions but with explicit phoneme duration markers removed. The model is trained using tacotron-cli.

The model achieves the following values on the validation set:

  • MOS naturalness: 3.49 ± 0.28 (GT: 4.17 ± 0.23)
  • MOS intelligibility: 4.44 ± 0.21 (GT: 4.63 ± 0.19)
  • mean mel-cepstral distance: 30.96
  • mean penalty: 0.1341

Files:

  • 101000.pt
    • checkpoint after 500 epochs with a batch size of 64
  • 1-setup-env.sh
    • script to install all required tools
  • 2-create-dataset.sh
    • script to create the base dataset using public resources
  • 3-create-train-val-set.sh
    • script to create the training set and validation set
  • 4-start-training.sh
    • script to start training using Tacotron
  • 5-convert-english-to-ipa.sh
    • script to prepare English texts for synthesis by transcribing them to IPA
  • 6-synthesize.sh
    • script to synthesize IPA transcribed text
  • example-north-wind.zip
    • contains an example passage which was synthesized using the model

The model is able to synthesize the following symbols:

  • vowels: i, u, æ, ɑ, ɔ, ə, ɛ, ɪ, ʊ, ʌ
  • diphthongs: aɪ, aʊ, eɪ, oʊ, ɔɪ
  • r-colored vowels: ɔr, ər, ɛr, ɪr, ʊr, ʌr
  • consonants: b, d, dʒ, f, h, j, k, l, m, n, p, r, s, t, tʃ, v, w, z, ð, ŋ, ɡ, ʃ, θ
  • breaks: SIL0, SIL1, SIL2, SIL3
  • special characters: . ? ! , : ; - — " ' ( ) [ ]

Each vowel, diphthong and r-colored vowel can have a leading stress symbol ˈˌ attached, e.g., ˈoʊ.

Example:

The North Wind and the Sun were disputing which was the stronger, when a traveler came along wrapped in a warm cloak.

ð|ə|SIL0|n|ˈɔr|θ|SIL0|w|ˈɪ|n|d|SIL0|ə|n|d|SIL0|ð|ə|SIL0|s|ˈʌ|n|SIL0|w|ˈʌr|SIL0|d|ɪ|s|p|j|ˈu|t|ɪ|ŋ|SIL0|w|ˈɪ|tʃ|SIL0|w|ˈɑ|z|SIL0|ð|ə|SIL0|s|t|r|ˈɔ|ŋ|ər|,|SIL1|w|ˈɛ|n|SIL0|ə|SIL0|t|r|ˈæ|v|ə|l|ər|SIL0|k|ˈeɪ|m|SIL0|ə|l|ˈɔ|ŋ|SIL0|r|ˈæ|p|t|SIL0|ɪ|n|SIL0|ə|SIL0|w|ˈɔr|m|SIL0|k|l|ˈoʊ|k|.|SIL2

Notes (English)

The authors gratefully acknowledge the GWK support for funding this project by providing computing time through the Center for Information Services and HPC (ZIH) at TU Dresden.

The authors are grateful to the Center for Information Services and High Performance Computing [Zentrum fur Informationsdienste und Hochleistungsrechnen (ZIH)] at TU Dresden for providing its facilities for high throughput calculations.

Files

example-north-wind.zip

Files (342.7 MB)

Name Size Download all
md5:664cac6dcc130411e4d049e1538c516c
726 Bytes Download
md5:9c77b2ee8dfdb50ac556588064943846
341.5 MB Download
md5:4370d52bc842f8f762465f66a9559260
673 Bytes Download
md5:28497962271dadc210b536d88c66dc83
14.2 kB Download
md5:ee60232dcf4ee20c843aa9983bf5503c
2.0 kB Download
md5:a07f757776a4276b8927eb60e4ed50ad
3.1 kB Download
md5:50c58aa494904442f70d509c86edab0c
2.7 kB Download
md5:bc6c1271423dd42c181f19ffa036ab8e
1.2 MB Preview Download

Additional details

Funding

Deutsche Forschungsgemeinschaft
SFB 1410 416228727

References