IMPROVING SPEECH NATURALNESS IN UZBEK TEXT-TO-SPEECH USING DEEP LEARNING-BASED PROSODY MODELING

Yuldasheva Umida Husniddin qizi ,

doi:10.5281/zenodo.18651972

Published February 15, 2026 | Version v1

Dataset Open

IMPROVING SPEECH NATURALNESS IN UZBEK TEXT-TO-SPEECH USING DEEP LEARNING-BASED PROSODY MODELING

Yuldasheva Umida Husniddin qizi , (Contact person)¹

1. Samarkand Branch of Tashkent University of Information Technologies

Speech naturalness is one of the most critical challenges in text-to-speech (TTS) systems, especially for low-resource languages such as Uzbek. While recent advances in deep learning have significantly improved the intelligibility of synthesized speech, achieving natural prosody—including appropriate intonation, rhythm, stress, and timing—remains a complex problem. This study focuses on improving speech naturalness in Uzbek TTS systems through deep learning-based prosody modeling. The paper analyzes existing approaches to prosody modeling, discusses the linguistic characteristics of the Uzbek language that affect prosodic patterns, and proposes the integration of neural network-based methods to capture expressive and natural speech features. The findings highlight the potential of deep learning architectures to enhance the quality and naturalness of Uzbek speech synthesis and contribute to the development of more human-like TTS systems.

Files

465-468.pdf

Files (212.5 kB)

Name	Size	Download all
465-468.pdf md5:a43427b0755b61b329e4a903dcb7e54a	212.5 kB	Preview Download

Additional details

Taylor, P. (2009). Text-to-Speech Synthesis. Cambridge University Press.
Zen, H., Tokuda, K., & Black, A. W. (2009). Statistical parametric speech synthesis. Speech Communication, 51(11), 1039–1064.
Wang, Y., et al. (2017). Tacotron: Towards end-to-end speech synthesis. Proceedings of Interspeech.
Jumanazar o'g'li, B. J. SOCIO-PSYCHOLOGICAL CHARACTERISTICS OF THE FORMATION OF SOCIAL INSTITUTIONS IN STUDENTS
Oord, A. V. D., et al. (2016). WaveNet: A generative model for raw audio. arXiv preprint arXiv:1609.03499.

	All versions	This version
Views	33	33
Downloads	1	1
Data volume	212.5 kB	212.5 kB

IMPROVING SPEECH NATURALNESS IN UZBEK TEXT-TO-SPEECH USING DEEP LEARNING-BASED PROSODY MODELING

Authors/Creators

Description

Files

465-468.pdf

Files (212.5 kB)

Additional details

References