Published November 27, 2023
| Version 0.0.1
Model
Open
THCHS-30 Chinese TTS
Description
This upload contains a TTS model which was trained on the THCHS-30 dataset using these transcriptions but with explicit phoneme duration markers removed. The model is trained using tacotron-cli (v0.0.4).
The model achieves the following values on the validation set:
- mean mel-cepstral distance: 25.19
- mean penalty: 0.1456
Files:
- 103500.pt
- checkpoint after 500 epochs with a batch size of 64
- 1-setup-env.sh
- script to install all required tools
- 2-create-dataset.sh
- script to create the base dataset using public resources
- 3-create-train-val-set.sh
- script to create the training set and validation set
- 4-start-training.sh
- script to start training using Tacotron
- 5-convert-chinese-to-ipa.sh
- script to prepare Chinese texts for synthesis by transcribing them to IPA
- 6-synthesize.sh
- script to synthesize IPA transcribed text
- example-north-wind.zip
- contains an example passage which was synthesized using the model (speaker: D7)
The model is able to synthesize the following symbols:
- vowels: a ɛ e ə ɚ ɤ i o ɹ̩ ɻ ɻ̩ u ʊ y
- diphthongs: ai̯ au̯ ei̯ ou̯
- consonants: f j k kʰ l m n p pʰ s t ts tsʰ tɕ tɕʰ tʰ w x ŋ ɕ ɥ ʂ ʈʂ ʈʂʰ
- breaks:
- SIL0 -> no break
- SIL1 -> short break
- SIL2 -> break
- SIL3 -> long break
- special characters: 。 ?
Vowels and diphthongs contain one of these tones:
- ˥ -> first tone, e.g., e˥
- ˧˥ -> second tone, e.g., e˧˥
- ˧˩˧ -> third tone, e.g., e˧˩˧
- ˥˩ -> fourth tone, e.g., e˥˩
- nothing -> no tone, e.g., e
Available speakers:
- male: A9, A33, A35, B21, B34, A8, B8, C8, D8
- female: A11, A12, A13, A14, A19, A2, A22, A23, A32, A34, A36, A4, A5, A6, A7, B11, B12, B15, B2, B22, B31, B32, B33, B4, B6, B7, C12, C13, C14, C17, C18, C19, C2, C20, C21, C22, C23, C31, C32, C4, C6, C7, D11, D12, D13, D21, D31, D32, D4, D6, D7
Example sentence:
有一次, 北风 和 太阳 正在 争论 谁 比较 有本事。
j|ou̯˧˩˧|i˥|tsʰ|ɹ̩˥˩|SIL2|p|ei̯˧˩˧|f|ə˥|ŋ|SIL0|x|ɤ˧˥|SIL0|tʰ|ai̯˥˩|j|a˧˥|ŋ|SIL0|ʈʂ|ə˥˩|ŋ|ts|ai̯˥˩|SIL0|ʈʂ|ə˥|ŋ|l|w|ə˥˩|n|SIL0|ʂ|w|ei̯˧˥|SIL0|p|i˧˩˧|tɕ|j|au̯˥˩|SIL0|j|ou̯˧˩˧|p|ə˧˩˧|n|ʂ|ɻ̩˥˩|。
Notes (English)
Files
example-north-wind.zip
Files
(342.8 MB)
Name | Size | Download all |
---|---|---|
md5:aa8de1324b822cb70447e74a7f514ddb
|
728 Bytes | Download |
md5:54e5b3dd2e3d0a35922c361b0bd82f19
|
341.6 MB | Download |
md5:3784afec07415a697afac0baeb3952f2
|
3.0 kB | Download |
md5:11de1b5ca5e13a7a586cc3e51b044a64
|
16.2 kB | Download |
md5:e9237e3038dcfa9c143fb29aab914c19
|
1.9 kB | Download |
md5:2bef60367cea7cd09ffc35a9bafea20f
|
2.9 kB | Download |
md5:e6db33a699ab6b23ca8938cfe1e26c75
|
2.7 kB | Download |
md5:98fee13d863016a14ea8ac8d7f9cf5c3
|
1.2 MB | Preview Download |
Additional details
References
- Wang, D., Wu, D., & Zhu, X. (2001). TCMSD: A New Chinese Continuous Speech Database. International Conference on Chinese Computing (ICCC'01), 2001.
- Wang, D., Zhang, X., & Zhang, Z. (2015). THCHS-30: A Free Chinese Speech Corpus (arXiv:1512.01882). arXiv. http://arxiv.org/abs/1512.01882
- Wang, Y., Skerry-Ryan, R. J., Stanton, D., Wu, Y., Weiss, R. J., Jaitly, N., Yang, Z., Xiao, Y., Chen, Z., Bengio, S., Le, Q., Agiomyrgiannakis, Y., Clark, R., & Saurous, R. A. (2017). Tacotron: Towards End-to-End Speech Synthesis. Interspeech 2017, 4006–4010. https://doi.org/10.21437/Interspeech.2017-1452
- Shen, J., Pang, R., Weiss, R. J., Schuster, M., Jaitly, N., Yang, Z., Chen, Z., Zhang, Y., Wang, Y., Skerrv-Ryan, R., Saurous, R. A., Agiomvrgiannakis, Y., & Wu, Y. (2018). Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions. 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 4779–4783. https://doi.org/10.1109/ICASSP.2018.8461368
- Prenger, R., Valle, R., & Catanzaro, B. (2019). WaveGlow: A Flow-based Generative Network for Speech Synthesis. ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 3617–3621. https://doi.org/10.1109/ICASSP.2019.8683143
- Taubert, S. (2023). THCHS-30 - Aligned IPA transcriptions (0.0.1) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.7528596
- Taubert, S. (2023). tacotron-cli (0.0.4). Zenodo. https://doi.org/10.5281/zenodo.7543638
- Taubert, S. (2022). waveglow-cli (0.0.1). Zenodo. https://doi.org/10.5281/zenodo.7044345