There is a newer version of the record available.

Published June 24, 2022 | Version 1.0
Dataset Open

Sudanese dialect speech dataset

Creators

  • 1. Sudan university of science & technology

Contributors

Supervisor:

  • 1. Sudan university of science & technology

Description

This is speech dataset for the Sudanese dialect The data been collected from YouTube videos represent the characteristics of
   the Sudanese dialect, mainly the middle of Sudan dialect -Khartoum in particular-
   and have some northern tendency, primarily two programs Hajj Muzakir and Dukkan Wad Elbaseer.

Transcription is done manually by listening to the audio files repeatedly to write the captions for the collected conversations to make
   sure that every word is written as said by the speakers. Transcription is written without
   diacritics on the Arabic alphabet, in a manner that reflects the Sudanese way of speaking, therefore,
   any correction to the noticeable mistakes was not applied to get rid of any biases and make the data representative.

The 'Dataset' subdirectory contains all the audio and text files for the corpus, the files organized based on
   program name 'hm_' for Hajj Muzakir program and 'wb_' Dukkan Wad Elbaseer, each filename follows
   three categories first two litters for the program name 'hm' or 'wb', second the number of the episode
   third the number of the clip, hm_01_0001.wav and wb_01_0001.wav represent first episode of each program and the first clip.

Files

Sudanese_dialect_speech_dataset.zip

Files (1.4 GB)

Name Size Download all
md5:b7d922a7226b6654bd9e93c698ae8e16
1.4 GB Preview Download