Published October 31, 2025
| Version 0.5
Dataset
Open
Mafoko Companion Dataset for Mafoko Open Multilingual Terminologies Paper
Creators
-
Marivate, Vukosi
(Project leader)1, 2, 3, 4
- Dzingirai, Isheanesu (Researcher)1, 4
-
Banda, Fiskani Ella
(Researcher)1, 4
- Richard, Lastrucci (Researcher)1, 4
-
SINDANE, THAPELO
(Researcher)1, 4
-
Madumo, Keabetswe1, 4
-
Olaleye, Kayode
(Researcher)1, 4
-
Modupe, Abiodun
(Researcher)1, 4
-
Netshifhefhe, Unarine Leo
(Researcher)1, 4
-
Combrink, Herkulaas MvE
(Researcher)5
- Nakeng, Mohlatlego (Researcher)1, 4
-
Ledwaba, Matome Brilliant
(Researcher)1, 4
- 1. University of Pretoria
- 2. Lelapa Ai
- 3. African Institute for Data Science and Artificial Intelligence
- 4. Data Science for Social Impact
- 5. University of the Free State
Description
The Mafoko project systematically aggregates, digitises, and standardises fragmented multilingual terminological resources for South Africa’s official languages. Sourced from government and academic repositories, these terminologies have historically been locked in non-machine-readable formats and inaccessible structures, limiting their use for linguistic research and NLP development.
This release provides the foundational Mafoko dataset, curated under the equitable Africa-centered NOODL licensing framework. Data is released in open, machine-readable formats (CSV/JSON) with provenance metadata and ISO language identifiers.
Author list
- Vukosi Marivate (University of Pretoria; AfriDSAI; Lelapa AI)
- Isheanesu Dzingirai (University of Pretoria)
- Fiskani Banda (University of Pretoria)
- Richard Lastrucci (University of Pretoria)
- Thapelo Sindane (University of Pretoria)
- Keabetswe Madumo (University of Pretoria)
- Kayode Olaleye (University of Pretoria)
- Abiodun Modupe (University of Pretoria)
- Unarine Netshifhefhe (University of Pretoria)
- Herkulaas Combrink (University of the Free State)
- Mohlatlego Nakeng (University of Pretoria)
- Matome Ledwaba (University of Pretoria)
Corresponding author: vukosi.marivate@cs.up.ac.za
Files
NOODL _Plain‑Language Explainer [V4].pdf
Additional details
Related works
- Is supplement to
- Preprint: arXiv:2508.03529 (arXiv)
Software
- Repository URL
- https://github.com/dsfsi/za-mafoko
- Programming language
- Python
- Development Status
- Active