Published June 25, 2025
| Version v2
Dataset
Open
ELNER-DZ: A Dataset for Named Entity Recognition and Entity Linking in Algerian Arabic Dialect
Authors/Creators
Description
ELNER-DZ is the first large-scale dataset for Named Entity Recognition (NER) and Entity Linking (EL) in Algerian Arabic (Darija), including Arabic script and Arabizi (Latin-script). It was developed as part of a Master’s thesis .
The dataset contains over 2 million dialectal sentences labeled with more than 1.9 million named entities linked to Wikidata QIDs. It includes annotations in JSON format and is suitable for NLP, NER, and EL tasks on low-resource, dialectal, and code-switched data.
Files
README.md
Additional details
Dates
- Copyrighted
-
2025-06-25