Published October 29, 2024 | Version v2
Dataset Open

Helsinki Speech Challenge 2024 open audio dataset

  • 1. ROR icon Norwegian University of Science and Technology
  • 2. ROR icon University of Helsinki

Description

Dataset

This training dataset is originally designed for the Helsinki Speech Challenge 2024 (HSC2024). While it was created with this challenge in mind, its applications extend far beyond, making it a valuable resource for developing and testing audio algorithms across diverse uses.

Our dataset features clean speech samples generated by OpenAI's text-to-speech model, paired with corresponding recorded signals. These recorded signals are purposefully distorted by real-world effects such as filtering and reverb, offering a realistic testing ground for your audio processing algorithms.

To ensure ease of use, the audio samples are organized into 10 separate zip files, each dedicated to a specific task and level of complexity. Most zip files include two folders clean and recorded: one containing clean audio data and the other housing the corresponding recorded (distorted) data. However, note that folders Task_3_Level_1 and Task_3_Level_2 only include recorded data, as their clean counterparts are identical to those in Task_2_Level_2 and Task_2_Level_3, respectively. Additionally, each zip file includes a .txt file with the original text samples associated with the audio clips.

Update 29th October: We have now also added the test data that was used for evaluation in the challenge with similar structure to the rest of the data, contained in the zip files starting with "Test".

Additionally, there is folder Impulse_Responses, which contains a clean sine sweep signal and short and a long white noise signal with recorded counterparts. 
To help you get started, we've also provided an example folder containing samples from each task and level, giving you a comprehensive overview of the dataset's scope and variety.

The dataset also contains a python script evaluate.py. This can be used to evaluate the quality of audio files using the Mozilla Deepspeech speech recognition model. For more details on this script, see the more detailed description of the data challenge either on the website below or on arXiv https://arxiv.org/abs/2406.04123.

Important dates:

  • Data Challenge Launch: 10. June 2024.
  • Sign-up deadline: 1. September 2024 (if you missed this deadline and wish to participate in the challenge, please send us an email).
  • Submission deadline: 6. October 2024. We realize this deadline is a bit optimistic, but we humbly ask participants to try to make this deadline.
  • Results are published: 4. November.
  • Inverse days: 10.-13. December in Oulu, Finland.

Here is a link to the official webpage of the HSC2024: https://blogs.helsinki.fi/helsinki-speech-challenge/

Contact Email: hsc2024@helsinki.fi

Files

example_task_1_level_3_clean_022.wav

Files (1.6 GB)

Name Size Download all
md5:26e7460f9066466160e51a3a4d5fdd69
6.5 kB Download
md5:762098e1a8bc0ea40118722f10be750d
93.5 kB Preview Download
md5:11ef85b7363a8cb1a4a9a5d5b4a1a2fb
94.3 kB Preview Download
md5:06280af78b12ce5cf4876010cc4cc519
2.4 MB Preview Download
md5:3138d27a280e12fd119eb06266704ccf
13.6 MB Preview Download
md5:352802f4c52aee58e4ba195de4bf78ac
566 Bytes Preview Download
md5:58a0a29868690e91b33a9d2f916838ce
117.4 MB Preview Download
md5:bd47a7ee570e751e072f56985b17039b
120.2 MB Preview Download
md5:2f9791f6e72d885cfd8107f3ae84cc3e
120.3 MB Preview Download
md5:67b43fff73011418cbded2f3243ec878
125.3 MB Preview Download
md5:25f20562f882423bd75503821699cd4c
123.2 MB Preview Download
md5:53c75d496d44238823d42976bfab2084
125.0 MB Preview Download
md5:f674ec3408e75845944d13f87daf4e5e
125.1 MB Preview Download
md5:8424a87ff769071b8a121a4114ed4e4b
104.3 MB Preview Download
md5:8d01344b588fc281a8d19d03ad057fe2
93.9 MB Preview Download
md5:b955871fb5da0019819d4535446ae8bd
99.4 MB Preview Download
md5:7b3aca42a288a39077d2545dbba599a8
69.8 MB Preview Download
md5:edf9ebb1f01903f783d2cac1fd2a830c
69.9 MB Preview Download
md5:c638a6b1f0826ba49e3d3870f9a1efa8
17.4 MB Preview Download
md5:3ce32932e18abd1ef0acdc7e38d09123
17.3 MB Preview Download
md5:dd211b1509666a448eef5f35504e7b85
17.6 MB Preview Download
md5:582e0291e4b34cd27fb48516be26b8e3
18.2 MB Preview Download
md5:c66c08dcd7e27e2947a3b7d6a3642a59
18.4 MB Preview Download
md5:b82117ac924ed3aaf561538598979a74
18.5 MB Preview Download
md5:107fdb8232b62665827e1be01433a2c0
18.1 MB Preview Download
md5:efdff087f2fed520aa7286f7660f4fae
29.3 MB Preview Download
md5:09ec96506c472fd750faa6095eac585d
28.9 MB Preview Download
md5:d8fdba307bfac92f3fcf5efb69a83071
29.7 MB Preview Download
md5:2c3b1a463bea45d4a6ed5d8297e87b14
29.6 MB Preview Download
md5:2e3fa10f4fc1358f580e6902466cc6d7
29.4 MB Preview Download