Dataset Open Access
Mukiibi, Jonathan; Hussein, Ali; Meyer, Joshua; Katumba, Andrew; Nakatumba-Nabende, Joyce
The Makerere AI Lab has built an end-to-end CTC Luganda ASR model using radio data. Having encountered data challenges in working with low resource languages, we take the initiative together with our partners to release the first radio corpus for Luganda.
The corpus of 155 hours is publicly available online under the Creative Commons BY-NC-ND 4.0 license. The dataset release is comprised of the following:
NOTE: You can read and cite our paper published in the Proceedings of the 13th Conference on Language Resources and Evaluation (LREC 2022)
Name | Size | |
---|---|---|
makerere_radio_dataset.tar.gz
md5:bd4abf4dcf9cb3949a37a37b595e9026 |
6.4 GB | Download |
All versions | This version | |
---|---|---|
Views | 417 | 417 |
Downloads | 62 | 62 |
Data volume | 369.2 GB | 369.2 GB |
Unique views | 356 | 356 |
Unique downloads | 49 | 49 |