Parameter-Efficient Fine-Tuning of XLS-R for Arabic Speech Recognition
Authors/Creators
Description
Parameter-Efficient Fine-Tuning of XLS-R for Arabic Speech Recognition
Arabic Automatic Speech Recognition (ASR) faces persistent challenges due to complex morphology, dialectal variation, and limited labeled data. While large self-supervised models such as wav2vec2-XLSR (XLS-R) have demonstrated strong performance for Arabic ASR, their large size makes full fine-tuning computationally expensive and impractical in many settings.
This release accompanies our study on parameter-efficient fine-tuning (PEFT) methods for Arabic ASR, providing the first systematic evaluation of LoRA and DoRA applied to a CTC-based self-supervised model (XLS-R). We evaluate full fine-tuning, LoRA, and DoRA on the newly released Mozilla Common Voice Arabic v24.0 dataset.
Our results show that full fine-tuning achieves 23.03% Word Error Rate (WER), establishing a new state-of-the-art among XLS-R-based Arabic ASR models. LoRA achieves 36.10% WER while training only ~2.2% of model parameters, offering a strong accuracy–efficiency trade-off and enabling lightweight deployment via small adapters. DoRA is evaluated for Arabic speech recognition for the first time.
This Zenodo record includes the training and evaluation code, configuration files, and trained LoRA and DoRA adapters, supporting reproducibility and future research on efficient Arabic ASR systems.
Files
Fine_Tune_XLS_R_on_Common_Voice_(Final).ipynb
Files
(3.6 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:32dbc5bcc812c2c742e70bbdad032e83
|
354.4 kB | Preview Download |
|
md5:76f7dbb03a60c2d894f350123b0628e0
|
3.5 kB | Preview Download |
|
md5:b6d6b0d6a9a9b641b19acdf156188403
|
81.4 MB | Preview Download |
|
md5:1d9b767044a371cad9ba361bae357902
|
3.4 GB | Preview Download |
|
md5:7134b0b07e4dbaf0135d1b5aeda6a861
|
78.8 MB | Preview Download |