Published June 1, 2026 | Version 0.1
Dataset Open

A consolidated lexical dataset for Dogon languages

  • 1. CNRS Délégation Paris-Villejuif
  • 2. LLACAN - Langage, Langues et Cultures d'Afrique
  • 3. Researcher
  • 4. IRD-MISELI-Mali

Description

This dataset is a staged release of a consolidated lexical dataset for Dogon languages. It brings together heterogeneous source layers, including RefLex-derived data, Dogon and Bangime Linguistics materials, CLDF/LexiBank-derived working files, and subsequent BANG project curation. The workflow includes transcription standardization, source and language-name normalization, staged merging, Concepticon and part-of-speech enrichment, manual revision, source-village and GPS verification, doculect construction, and Glottolog alignment.

This release should be treated as provisional. Remaining issues include duplicate resolution, language-level attribution auditing, and verification of the incorporation of earlier manual curation of verbal paradigms. Full source-level attribution and contributor roles are documented in ATTRIBUTION.md.

Files

ATTRIBUTION.md

Files (72.8 MB)

Name Size Download all
md5:eea809324f494d19a13ca1f32f7352ba
3.3 kB Preview Download
md5:8af6a2f5bd29c711fa7c3fabfb31bf2b
72.8 MB Download
md5:2a24e565ea046f685eee5a9ddcd61f42
6.6 kB Preview Download

Additional details

Funding

European Commission
BANG - The Mysterious Bang: A Language and Population Isolate Unlocks the Secrets of Interior West Africa's Lost Ethnolinguistic Diversity 101045195

Software