Published May 30, 2025 | Version v1
Dataset Open

DCASE2025Task4EvaluationDataset : The Evaluation Dataset for Spatial Semantic Segmentation of Sound Scenes

  • 1. NTT Corporation

Description

This dataset was recorded and designed for the Spatial Semantic Segmentation of Sound Scenes (S5) challenge task of the DCASE2025 Challenge. The development set for this dataset is available here.

This dataset including soundscapes that are generated by mixing 1 to 3 samples from the 18 sound events classes recorded in an anechoic chamber. convolves with newly recorded room impulse responses (RIRs). Note that since this dataset is designed for evaluation, it does not include the individual sounds, RIRs, or Noise of the sound event itself, only the soundscape after synthesis. The soundscapes are 10 seconds each and contain 2290 files. Of these, the first 1620 files (eval_0000.wav,....eval_1619.wav) will be used to calculate the ranking for DCASE Challenge 2025 Task 4. The remaining files are used for task analysis. All of the acoustic data and RIR formats included in this dataset are 32kHz/16bit. In the following part of this description, we will briefly summarize the recording of sound events and RIR. 

Below are the details of the sound event, RIR, and Noise used to synthesize soundscape.

Ground truth dataset(update 2025/8/5)

We have published the ground truth of this evaluation dataset at Ground truth for DCASE2025Task4EvaluationDataset.
Using this ground truth dataset and nttrd-cslabs/dcase2025_SpatialSemanticSegmentation_evaluator, you can evaluate the performance of the S5 tasks in the evaluation set.

Anechoic Sound Event (ASE) for Evaluation

This is an isolated sound event dataset for evaluation purposes, released under the same specifications as ASE1K, which is available in the development set. The recording was made using three cardioid microphones to capture the sound events from the left, front and right, and one omnidirectional microphone to capture the sound from above. In the S5 task, it is assumed that you will simply select a single channel (e.g. ch=3) from these and use it as a monaural sound event. For each class, around 20 events were recorded.

FOA RIR for Evaluation

The RIR dataset is made up of RIRs recorded in six environments for DCASE2025 Task4. These recording environments differ from the RIR included in the development set. All recordings were made using the same FOA Microphone (Sennheiser Ambeo VR Mic). RIR recordings were made from multiple locations in each environment, and these are compiled in sofa file format.

Noise recordings for Evaluation

This dataset also includes noise recordings in the FOA format. All of noise recordings are newly recorded for this evaluation by using FOA Microphone (Sennheiser Ambeo VR Mic). 

Further information is available at [1], task description and Github.

[1] Masahiro Yasuda, Binh Thien Nguyen, Noboru Harada, Romain Serizel, Mayank Mishra, Marc Delcroix, Shoko Araki, Daiki Takeuchi, Daisuke Niizumi, Yasunori Ohishi, Tomohiro Nakatani, Takao Kawamura, Nobutaka Ono, "Description and discussion on DCASE 2025 challenge task 4: Spatial Semantic Segmentation of Sound Scenes," in arXiv preprint arXiv:xxxx.xxxx, 2025. (will be available at 6/10)

 

Files

LISENCEv2.2.pdf

Files (4.6 GB)

Name Size Download all
md5:0504c5c1941138d86cffaf93d7020b5f
536.9 MB Download
md5:9b8873e36514fc1511e35d9dd10c69d6
536.9 MB Download
md5:1e34644c3e210f3b320e6a147251a626
536.9 MB Download
md5:fa783cc524e2160e1f21db75f691f490
536.9 MB Download
md5:cbc7c094841029992706c9b961e912e0
536.9 MB Download
md5:71c933801ae58f003b01f6a9eb3a6617
536.9 MB Download
md5:fe63ba987d7238cbb9a43d9e75b13f54
536.9 MB Download
md5:0c4b920a02e1731cd8d967361dcb2ba7
536.9 MB Download
md5:74c079eb7a296b4a7974ab342faf0dc0
344.7 MB Preview Download
md5:154227c0241baf9a9ab4ee88df58475e
310.9 kB Preview Download