Published March 18, 2026 | Version v1
Journal article Open

KARAKALPAK SPEECH CORPUS: THE FIRST BENCHMARK DATASET FOR AUTOMATIC SPEECH RECOGNITION

  • 1. DSc, Head of department, Nukus state technical university, Nukus, Uzbekistan
  • 2. PhD, Kimyo International University in Tashkent, Tashkent, Uzbekistan
  • 3. Senior lecturer, Nukus state technical university, Nukus, Uzbekistan
  • 4. Teaching assistant, Nukus state technical university, Nukus, Uzbekistan

Description

While large-scale pre-trained models have significantly advanced multilingual Automatic Speech Recognition (ASR), many low-resource languages remain under-served due to the scarcity of high-quality annotated speech corpora. This paper introduces the Karakalpak Speech Corpus (KSC), the first publicly available benchmark dataset for Karakalpak, a Turkic language spoken by over two million people primarily in Karakalpakstan. The corpus encompasses 50 hours of predominantly read speech. The data was collected from 25 native speakers with a balanced gender distribution. To establish a performance benchmark, we fine-tuned the Wav2Vec 2.0 architecture, specifically evaluating the efficacy of transfer learning from multilingual pre-trained models.

Files

40_1079-262-269-Kudaybergenov.pdf

Files (324.7 kB)

Name Size Download all
md5:15e4432cc2396ae14d0520902a78c530
324.7 kB Preview Download

Additional details

References

  • S. Sinh, S. Dey, G. Saha. Improving self-supervised learning model for audio spoofing detection with layer-conditioned embedding fusion. Computer Speech & Language, vol. 86, 2024, 101599, doi:10.1016/j.csl.2023.101599.
  • Z. Kozhirbayev. Kazakh Speech Recognition: Wav2vec2.0 vs. Whisper. Journal of Advances in Information Technology, Vol. 14, No. 6, 2023.
  • S. Tian, Z. Li, Z. Lyv, G. Cheng, Q. Xiao, T. Li, M. Zhao. Factorized and progressive knowledge distillation for CTC-based ASR models. Speech Communication, vol. 160, 2024, 103071, doi:10.1016/j.specom.2024.103071.
  • A. Povey, K. Povey. FeruzaSpeech: A 60 Hour Uzbek Read Speech Corpus with Punctuation, Casing, and Context. https://arxiv.org/abs/2410.00035, 2024.
  • R. Davronov. Uzbek Speech to Text model with Wav2Vec 2.0, available at: https://huggingface.co/rifkat.