Published June 25, 2025 | Version v2
Dataset Open

ELNER-DZ: A Dataset for Named Entity Recognition and Entity Linking in Algerian Arabic Dialect

Description

ELNER-DZ is the first large-scale dataset for Named Entity Recognition (NER) and Entity Linking (EL) in Algerian Arabic (Darija), including Arabic script and Arabizi (Latin-script). It was developed as part of a Master’s thesis .

The dataset contains over 2 million dialectal sentences labeled with more than 1.9 million named entities linked to Wikidata QIDs. It includes annotations in JSON format and is suitable for NLP, NER, and EL tasks on low-resource, dialectal, and code-switched data.

Files

README.md

Files (24.6 MB)

Name Size Download all
md5:987bcd97f9a7c3610110eeabeda6596f
24.6 MB Download
md5:f089df4e38af1cee099556de48a51571
439 Bytes Download
md5:4db993231cf03798662bbfe1dc8a2d0e
3.8 kB Preview Download

Additional details

Dates

Copyrighted
2025-06-25