THCHS-30 Chinese TTS with explicit duration markers

Taubert, Stefan

doi:10.5281/zenodo.10209990

Published November 27, 2023 | Version 0.0.1

Model Open

THCHS-30 Chinese TTS with explicit duration markers

Taubert, Stefan (Data manager)¹

1. Chemnitz University of Technology

This upload contains a TTS model which was trained on the THCHS-30 dataset using these transcriptions, which contain explicit phoneme duration markers (see below). The model is trained using tacotron-cli (v0.0.4).

The model achieves the following values on the validation set:

mean mel-cepstral distance: 24.58
mean penalty: 0.1072

Files:

103500.pt
- checkpoint after 500 epochs with a batch size of 64
1-setup-env.sh
- script to install all required tools
2-create-dataset.sh
- script to create the base dataset using public resources
3-create-train-val-set.sh
- script to create the training set and validation set
4-start-training.sh
- script to start training using Tacotron
5-convert-chinese-to-ipa.sh
- script to prepare Chinese texts for synthesis by transcribing them to IPA
6-synthesize.sh
- script to synthesize IPA transcribed text
example-north-wind.zip
- contains an example passage which was synthesized using the model (speaker: D7)

The model is able to synthesize the following symbols:

vowels: a ɛ e ə ɚ ɤ i o u ʊ y
diphthongs: ai̯ au̯ ei̯ ou̯
consonants: f j k kʰ l m n p pʰ ɹ̩ ɻ ɻ̩ s t ts tsʰ tɕ tɕʰ tʰ w x ŋ ɕ ɥ ʂ ʈʂ ʈʂʰ
breaks:
- SIL0 -> no break
- SIL1 -> short break
- SIL2 -> break
- SIL3 -> long break
special characters: 。 ?

Vowels and diphthongs contain one of these tones:

˥ -> first tone, e.g., e˥
˧˥ -> second tone, e.g., e˧˥
˧˩˧ -> third tone, e.g., e˧˩˧
˥˩ -> fourth tone, e.g., e˥˩
nothing -> no tone, e.g., e

Vowels, diphthongs and consonants contain one of these duration markers:

˘ -> very short, e.g., ou̯˘
nothing -> normal, e.g., ou̯
ˑ -> half long, e.g., ou̯ˑ
ː -> long, e.g., ou̯ː

Tones and duration markers can be combined, e.g., ə˧˥ː

Available speakers:

male: A9, A33, A35, B21, B34, A8, B8, C8, D8
female: A11, A12, A13, A14, A19, A2, A22, A23, A32, A34, A36, A4, A5, A6, A7, B11, B12, B15, B2, B22, B31, B32, B33, B4, B6, B7, C12, C13, C14, C17, C18, C19, C2, C20, C21, C22, C23, C31, C32, C4, C6, C7, D11, D12, D13, D21, D31, D32, D4, D6, D7

Example sentence:

有一次, 北风和太阳正在争论谁比较有本事。

j|ou̯˧˩˧|i˥|tsʰ|ɹ̩˥˩|SIL2|p|ei̯˧˩˧|f|ə˥|ŋ|SIL0|x|ɤ˧˥|SIL0|tʰ|ai̯˥˩|j|a˧˥˘|ŋ˘|SIL0|ʈʂ|ə˥˩|ŋ|ts|ai̯˥˩|SIL0|ʈʂ|ə˥|ŋ|l|w|ə˥˩ː|nˑ|SIL0|ʂ|w˘|ei̯˧˥|SIL0|p|i˧˩˧|tɕ˘|j|au̯˥˩˘|SIL0|j|ou̯˧˩˧|p|ə˧˩˧|n|ʂ|ɻ̩˥˩|。

Notes (English)

The authors gratefully acknowledge the GWK support for funding this project by providing computing time through the Center for Information Services and HPC (ZIH) at TU Dresden.

The authors are grateful to the Center for Information Services and High Performance Computing [Zentrum fur Informationsdienste und Hochleistungsrechnen (ZIH)] at TU Dresden for providing its facilities for high throughput calculations.

Files

example-north-wind.zip

Files (344.7 MB)

Name	Size	Download all
1-setup-env.sh md5:a3be5e5858d44ac00d543bb7673c8f52	732 Bytes	Download
103500.pt md5:6c4523fc68b0dcc177db189ec63d9dc8	343.6 MB	Download
2-create-dataset.sh md5:3784afec07415a697afac0baeb3952f2	3.0 kB	Download
3-create-train-val-set.sh md5:df3a48296494e451e801974bf7e209a8	15.6 kB	Download
4-start-training.sh md5:83913682cdda88e3245225111ecb8ffa	1.9 kB	Download
5-convert-chinese-to-ipa.sh md5:d270f948c6ded9bc198332a73ebaf5da	2.8 kB	Download
6-synthesize.sh md5:c39bb47023a56e67cc9c92556b5fbed8	2.8 kB	Download
example-north-wind.zip md5:75c5e26cd77df92aa93ca96f77f95c4c	1.1 MB	Preview Download

Additional details

Deutsche Forschungsgemeinschaft
SFB 1410 416228727

Wang, D., Wu, D., & Zhu, X. (2001). TCMSD: A New Chinese Continuous Speech Database. International Conference on Chinese Computing (ICCC'01), 2001.
Wang, D., Zhang, X., & Zhang, Z. (2015). THCHS-30: A Free Chinese Speech Corpus (arXiv:1512.01882). arXiv. http://arxiv.org/abs/1512.01882
Wang, Y., Skerry-Ryan, R. J., Stanton, D., Wu, Y., Weiss, R. J., Jaitly, N., Yang, Z., Xiao, Y., Chen, Z., Bengio, S., Le, Q., Agiomyrgiannakis, Y., Clark, R., & Saurous, R. A. (2017). Tacotron: Towards End-to-End Speech Synthesis. Interspeech 2017, 4006–4010. https://doi.org/10.21437/Interspeech.2017-1452
Shen, J., Pang, R., Weiss, R. J., Schuster, M., Jaitly, N., Yang, Z., Chen, Z., Zhang, Y., Wang, Y., Skerrv-Ryan, R., Saurous, R. A., Agiomvrgiannakis, Y., & Wu, Y. (2018). Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions. 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 4779–4783. https://doi.org/10.1109/ICASSP.2018.8461368
Prenger, R., Valle, R., & Catanzaro, B. (2019). WaveGlow: A Flow-based Generative Network for Speech Synthesis. ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 3617–3621. https://doi.org/10.1109/ICASSP.2019.8683143
Taubert, S. (2023). THCHS-30 - Aligned IPA transcriptions (0.0.1) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.7528596
Taubert, S. (2023). tacotron-cli (0.0.4). Zenodo. https://doi.org/10.5281/zenodo.7543638
Taubert, S. (2022). waveglow-cli (0.0.1). Zenodo. https://doi.org/10.5281/zenodo.7044345

	All versions	This version
Views	199	199
Downloads	207	207
Data volume	33.3 GB	33.3 GB

THCHS-30 Chinese TTS with explicit duration markers

Notes (English)

Files

example-north-wind.zip

Files (344.7 MB)

Additional details

Funding

References

THCHS-30 Chinese TTS with explicit duration markers

Creators

Description

Notes (English)

Files

example-north-wind.zip

Files (344.7 MB)

Additional details

Funding

References