DCASE 2024 Task 7 Dataset - Open Source

Tailleur, Modan; Lee, Junwon; Heller, Laurie; Choi, Keunwoo; McFee, Brian; Lagrange, Mathieu; Imoto, Keisuke; Okamoto, Yuki

doi:10.5281/zenodo.15630417

Published June 10, 2025 | Version v1

Dataset Open

DCASE 2024 Task 7 Dataset - Open Source

1. Laboratoire des Sciences du Numérique de Nantes
2. Gaudio Lab Inc.
3. Carnegie Mellon University
4. Gaudio Lab inc.
5. New York University
6. Doshisha University
7. Ritsumeikan University

This dataset supports the development and evaluation of prompt-based generative algorithms for environmental sound synthesis. It is designed for the Sound Scene Synthesis task, which consists of generating realistic environmental sound scenes from textual descriptions.

The dataset is a free and open version of the one used in the DCASE 2024 Task 7 challenge on sound scene synthesis. For a full description of the task and access to challenge results, please consult the official challenge page. An in-depth description of the challenge evaluation protocol and a detailed analysis of the results are available in [1].

Unlike the official challenge dataset, this version includes only audio sourced from Freesound and excludes any proprietary or private sound libraries.

📊 Dataset Overview

The dataset includes 310 audio clips, each 4 seconds long, along with their corresponding text prompts. Unlike typical audio captioning datasets, both the prompts and audio scenes were manually crafted and edited. This enables a more controlled and quantifiable evaluation of generative models.

Prompts follow a fixed structure:

> (foreground sound source) with (background sound source) in the background

Foreground sounds are action-based (e.g., a dog barking). They fall into six categories:

- animal
- vehicle
- human
- alarm
- tool
- entrance

These are paired with five possible background categories:

- crowd
- traffic
- water
- birds
- no background

> Note: Foreground vehicle sounds are not paired with the traffic background to avoid redundancy. The no background category enables the evaluation of monophonic scenes with isolated foreground sources.

The dataset is split into a developpement and an evaluation set:

- Development Set: 60 audio–caption pairs (backgrounds: crowd, traffic, water)
- Evaluation Set: 250 audio–caption pairs (backgrounds: crowd, traffic, water, birds, no background)

📁 Folder Structure

Inside the DCASE-TASK7-2024-Open-Source/ folder:

DCASE-TASK7-2024-Open-Source/
├── dev/
│ ├── audio/
│ └── caption.csv
├── eval/
│ ├── audio/
│ └── caption.csv

- audio/: Contains the audios in wav format.
- caption.csv: Provides corresponding prompts for each audio file.

📎 Citation

If you use this dataset in your research, please cite it as:

Tailleur, Modan; Lee, Junwon; Heller, Laurie; Choi, Keunwoo; McFee, Brian; Lagrange, Mathieu; Imoto, Keisuke; Okamoto, Yuki.
DCASE 2024 Task 7 Dataset - Open Source. Zenodo, 2024. DOI: 10.5281/zenodo.15630417

@misc{dcase2024task7opensource,
title = {DCASE 2024 Task 7 Dataset - Open Source},
author = {Tailleur, Modan and Lee, Junwon and Heller, Laurie and Choi, Keunwoo and McFee, Brian and Lagrange, Mathieu and Imoto, Keisuke and Okamoto, Yuki},
year = {2024},
publisher = {Zenodo},
doi = {10.5281/zenodo.15630417}
}

📚 References

[1] Lee, Junwon; Tailleur, Modan; Heller, Laurie M.; Choi, Keunwoo; Lagrange, Mathieu; McFee, Brian; Imoto, Keisuke; Okamoto, Yuki.
Challenge on Sound Scene Synthesis: Evaluating Text-to-Audio Generation. In Audio Imagination: NeurIPS 2024 Workshop on AI-Driven Speech, Music, and Sound Generation, 2024.

Files

DCASE-TASK7-2024-Open-Source.zip

Files (146.8 MB)

Name	Size	Download all
DCASE-TASK7-2024-Open-Source.zip md5:f7413cf80f644ea0e3e84b76060877ac	146.8 MB	Preview Download

	All versions	This version
Views	139	139
Downloads	30	30
Data volume	5.9 GB	5.9 GB

DCASE 2024 Task 7 Dataset - Open Source

Creators

Description

📊 Dataset Overview

📁 Folder Structure

📎 Citation

📚 References

Files

DCASE-TASK7-2024-Open-Source.zip

Files (146.8 MB)