Zenodo.org will be unavailable for 2 hours on September 29th from 06:00-08:00 UTC. See announcement.

Dataset Open Access

Baule speech dataset

Dougban Monsia

The dataset was created to enable research on automatic speech recognition in Boulé (Baule) language. The dataset was intentionally created with this task in mind, in order to participate in the Google NLP Hack Series: Intro to ASR Africa Challenge hosted on the Zindi Africa platform. It contains about 565 recordings of participants reading a transcription in Baule as spoken in Côte d’Ivoire, one sentence at a time. Each example contains the audio files and the associated text. The audio is recorded in a less noisy environment by the speakers using their android phone. The
dataset is multi-speaker, containing recordings from 4 volunteers (2 males and 2 females), where each volunteer contributed up to 141 recordings. The recordings took place in Abidjan, Côte d’Ivoire in April 2022.

Files (46.2 MB)
Name Size
46.2 MB Download
All versions This version
Views 108108
Downloads 99
Data volume 415.7 MB415.7 MB
Unique views 9999
Unique downloads 88


Cite as