Planned intervention: On Wednesday April 3rd 05:30 UTC Zenodo will be unavailable for up to 2-10 minutes to perform a storage cluster upgrade.

There is a newer version of the record available.

Published February 28, 2021 | Version 1
Dataset Restricted

SautiDB: Nigerian Accent Dataset Collection

Description

The SautiDB dataset collection project is an ongoing effort to collect datasets of various Nigerian accents. The dataset was collected in an uncontrolled manner, users who visit our webapp can record their voice and contribute to the dataset. The webapp uses the audio webapi to collect voice samples. We hope this dataset will be useful to people interested in developing voice technology in Nigeria. We will continuously collect more datasets and publish updated versions as we have them. This work grew out of our project Improving Online Experience using Accent Transfer.
 

The filename is of the form nativeLanguage_fluentLanguage_speakerID_gender_sentenceID.wav, where

  • nativeLanguage: language spoken by the speaker's tribe. Native (mother) language of the speaker
  • fluentLanguage: language that the speaker feels best describes their accents
  • speakerID: ID, assigned to the speaker. It is possible for a speaker to have multiple IDs assigned since we are not authenticating users, we simply cached their browser sessions. 
  • gender: gender of the speaker. We did not explicitly collect this information from users, we hand-labeled it. 
  • sentenceID: the sentence ID for the sentences read. We used the CMU Arctic sentences.


===========================
Before Postprocessing
===========================
Number of Samples: 1615
Size Webm: 59MB
Size Wav: 847MB
Sampling Rate: 48000Hz
Total Time: 2hrs 30min 21sec


============================
After Postprocessing
============================
Number of Samples: 919
Size Wav: 336MB
Sampling Rate: 48000Hz
Total Time: 0hrs 59min 08sec

The associated Github repository used for post-processing can also be found linked. We are grateful for funding from AI4D-IndabaX with IDRC Grant Number: 109187-002.

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Files

Restricted

The record is publicly accessible, but files are restricted to users with access.

Request access

If you would like to request access to these files, please fill out the form below.

You need to satisfy these conditions in order for this request to be accepted:


This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

 

 

 

 

 

You are currently not logged in. Do you have an account? Log in here