DCASE 2024 Task 7 Dataset - Open Source
Creators
Description
This dataset supports the development and evaluation of prompt-based generative algorithms for environmental sound synthesis. It is designed for the Sound Scene Synthesis task, which consists of generating realistic environmental sound scenes from textual descriptions.
The dataset is a free and open version of the one used in the DCASE 2024 Task 7 challenge on sound scene synthesis. For a full description of the task and access to challenge results, please consult the official challenge page. An in-depth description of the challenge evaluation protocol and a detailed analysis of the results are available in [1].
Unlike the official challenge dataset, this version includes only audio sourced from Freesound and excludes any proprietary or private sound libraries.
📊 Dataset Overview
The dataset includes 310 audio clips, each 4 seconds long, along with their corresponding text prompts. Unlike typical audio captioning datasets, both the prompts and audio scenes were manually crafted and edited. This enables a more controlled and quantifiable evaluation of generative models.
Prompts follow a fixed structure:
> (foreground sound source) with (background sound source) in the background
Foreground sounds are action-based (e.g., a dog barking). They fall into six categories:
- animal
- vehicle
- human
- alarm
- tool
- entrance
These are paired with five possible background categories:
- crowd
- traffic
- water
- birds
- no background
> Note: Foreground vehicle sounds are not paired with the traffic background to avoid redundancy. The no background category enables the evaluation of monophonic scenes with isolated foreground sources.
The dataset is split into a developpement and an evaluation set:
- Development Set: 60 audio–caption pairs (backgrounds: crowd, traffic, water)
- Evaluation Set: 250 audio–caption pairs (backgrounds: crowd, traffic, water, birds, no background)
📁 Folder Structure
Inside the DCASE-TASK7-2024-Open-Source/ folder:
DCASE-TASK7-2024-Open-Source/
├── dev/
│ ├── audio/
│ └── caption.csv
├── eval/
│ ├── audio/
│ └── caption.csv
- audio/: Contains the audios in wav format.
- caption.csv: Provides corresponding prompts for each audio file.
📎 Citation
If you use this dataset in your research, please cite it as:
Tailleur, Modan; Lee, Junwon; Heller, Laurie; Choi, Keunwoo; McFee, Brian; Lagrange, Mathieu; Imoto, Keisuke; Okamoto, Yuki.
DCASE 2024 Task 7 Dataset - Open Source. Zenodo, 2024. DOI: 10.5281/zenodo.15630417
@misc{dcase2024task7opensource,
title = {DCASE 2024 Task 7 Dataset - Open Source},
author = {Tailleur, Modan and Lee, Junwon and Heller, Laurie and Choi, Keunwoo and McFee, Brian and Lagrange, Mathieu and Imoto, Keisuke and Okamoto, Yuki},
year = {2024},
publisher = {Zenodo},
doi = {10.5281/zenodo.15630417}
}
📚 References
[1] Lee, Junwon; Tailleur, Modan; Heller, Laurie M.; Choi, Keunwoo; Lagrange, Mathieu; McFee, Brian; Imoto, Keisuke; Okamoto, Yuki.
Challenge on Sound Scene Synthesis: Evaluating Text-to-Audio Generation. In Audio Imagination: NeurIPS 2024 Workshop on AI-Driven Speech, Music, and Sound Generation, 2024.
Files
DCASE-TASK7-2024-Open-Source.zip
Files
(146.8 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:f7413cf80f644ea0e3e84b76060877ac
|
146.8 MB | Preview Download |