Building Arabic Speech Recognition System Using HuBERT Model and Studying the Sources of Errors

Sbih, Rima; Jafar, Assef; Kazem, Ali

doi:10.5281/zenodo.14723614

Published January 23, 2025 | Version v1

Journal article Open

Building Arabic Speech Recognition System Using HuBERT Model and Studying the Sources of Errors

1. Higher Institute for Applied Sciences and Technology_Damascus_Syria.

This paper presents the development of a speech recognition system for the Arabic language that can handle continuous speech and a large number of words, independent of the speaker, using deep neural network models trained by self-supervised learning. The system was built using the HuBERT model, and resulted in a word error rate (WER) of 19.3%. Our study on different data sets revealed that the HuBERT-based system has a significant ability to generalize to different spoken dialects. Additionally, we conducted a statistical analysis on the errors specific to the Arabic language that arise from the HuBERT-based system, which highlighted the necessity of incorporating an error correction language model to enhance system accuracy. After the addition of an Arabic language model, the WER decreased to 10.7%. Overall, this study emphasizes the potential of self-supervised learning-based speech recognition systems for the Arabic language and highlights the importance of incorporating language models to enhance system accuracy.

Files

-بناء نظام للتعرف على الكلام المنطوق باللغة العربية باستخدام النموذجHuBERT ودراسة مصادر الأخطاء الخاصة باللغة العربية والناتجة عن نظام التعرف.pdf

Files (426.9 kB)

Name	Size	Download all
-بناء نظام للتعرف على الكلام المنطوق باللغة العربية باستخدام النموذجHuBERT ودراسة مصادر الأخطاء الخاصة باللغة العربية والناتجة عن نظام التعرف.pdf md5:077b913c6c10d3410f44425c23a898dc	426.9 kB	Preview Download

	All versions	This version
Views	293	293
Downloads	150	150
Data volume	70.4 MB	70.4 MB

Building Arabic Speech Recognition System Using HuBERT Model and Studying the Sources of Errors

Authors/Creators

Description

Files

-بناء نظام للتعرف على الكلام المنطوق باللغة العربية باستخدام النموذجHuBERT ودراسة مصادر الأخطاء الخاصة باللغة العربية والناتجة عن نظام التعرف.pdf

Files (426.9 kB)