Published March 18, 2026
| Version v1
Journal article
Open
KARAKALPAK SPEECH CORPUS: THE FIRST BENCHMARK DATASET FOR AUTOMATIC SPEECH RECOGNITION
Authors/Creators
- 1. DSc, Head of department, Nukus state technical university, Nukus, Uzbekistan
- 2. PhD, Kimyo International University in Tashkent, Tashkent, Uzbekistan
- 3. Senior lecturer, Nukus state technical university, Nukus, Uzbekistan
- 4. Teaching assistant, Nukus state technical university, Nukus, Uzbekistan
Description
While large-scale pre-trained models have significantly advanced multilingual Automatic Speech Recognition (ASR), many low-resource languages remain under-served due to the scarcity of high-quality annotated speech corpora. This paper introduces the Karakalpak Speech Corpus (KSC), the first publicly available benchmark dataset for Karakalpak, a Turkic language spoken by over two million people primarily in Karakalpakstan. The corpus encompasses 50 hours of predominantly read speech. The data was collected from 25 native speakers with a balanced gender distribution. To establish a performance benchmark, we fine-tuned the Wav2Vec 2.0 architecture, specifically evaluating the efficacy of transfer learning from multilingual pre-trained models.
Files
40_1079-262-269-Kudaybergenov.pdf
Files
(324.7 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:15e4432cc2396ae14d0520902a78c530
|
324.7 kB | Preview Download |
Additional details
References
- S. Sinh, S. Dey, G. Saha. Improving self-supervised learning model for audio spoofing detection with layer-conditioned embedding fusion. Computer Speech & Language, vol. 86, 2024, 101599, doi:10.1016/j.csl.2023.101599.
- Z. Kozhirbayev. Kazakh Speech Recognition: Wav2vec2.0 vs. Whisper. Journal of Advances in Information Technology, Vol. 14, No. 6, 2023.
- S. Tian, Z. Li, Z. Lyv, G. Cheng, Q. Xiao, T. Li, M. Zhao. Factorized and progressive knowledge distillation for CTC-based ASR models. Speech Communication, vol. 160, 2024, 103071, doi:10.1016/j.specom.2024.103071.
- A. Povey, K. Povey. FeruzaSpeech: A 60 Hour Uzbek Read Speech Corpus with Punctuation, Casing, and Context. https://arxiv.org/abs/2410.00035, 2024.
- R. Davronov. Uzbek Speech to Text model with Wav2Vec 2.0, available at: https://huggingface.co/rifkat.