DCASE2025Task4EvaluationDataset : The Evaluation Dataset for Spatial Semantic Segmentation of Sound Scenes
Authors/Creators
- 1. NTT Corporation
Description
This dataset was recorded and designed for the Spatial Semantic Segmentation of Sound Scenes (S5) challenge task of the DCASE2025 Challenge. The development set for this dataset is available here.
This dataset including soundscapes that are generated by mixing 1 to 3 samples from the 18 sound events classes recorded in an anechoic chamber. convolves with newly recorded room impulse responses (RIRs). Note that since this dataset is designed for evaluation, it does not include the individual sounds, RIRs, or Noise of the sound event itself, only the soundscape after synthesis. The soundscapes are 10 seconds each and contain 2290 files. Of these, the first 1620 files (eval_0000.wav,....eval_1619.wav) will be used to calculate the ranking for DCASE Challenge 2025 Task 4. The remaining files are used for task analysis. All of the acoustic data and RIR formats included in this dataset are 32kHz/16bit. In the following part of this description, we will briefly summarize the recording of sound events and RIR.
Below are the details of the sound event, RIR, and Noise used to synthesize soundscape.
Ground truth dataset(update 2025/8/5)
Anechoic Sound Event (ASE) for Evaluation
This is an isolated sound event dataset for evaluation purposes, released under the same specifications as ASE1K, which is available in the development set. The recording was made using three cardioid microphones to capture the sound events from the left, front and right, and one omnidirectional microphone to capture the sound from above. In the S5 task, it is assumed that you will simply select a single channel (e.g. ch=3) from these and use it as a monaural sound event. For each class, around 20 events were recorded.
FOA RIR for Evaluation
The RIR dataset is made up of RIRs recorded in six environments for DCASE2025 Task4. These recording environments differ from the RIR included in the development set. All recordings were made using the same FOA Microphone (Sennheiser Ambeo VR Mic). RIR recordings were made from multiple locations in each environment, and these are compiled in sofa file format.
Noise recordings for Evaluation
This dataset also includes noise recordings in the FOA format. All of noise recordings are newly recorded for this evaluation by using FOA Microphone (Sennheiser Ambeo VR Mic).
Further information is available at [1], task description and Github.
[1] Masahiro Yasuda, Binh Thien Nguyen, Noboru Harada, Romain Serizel, Mayank Mishra, Marc Delcroix, Shoko Araki, Daiki Takeuchi, Daisuke Niizumi, Yasunori Ohishi, Tomohiro Nakatani, Takao Kawamura, Nobutaka Ono, "Description and discussion on DCASE 2025 challenge task 4: Spatial Semantic Segmentation of Sound Scenes," in arXiv preprint arXiv:xxxx.xxxx, 2025. (will be available at 6/10)
Files
LISENCEv2.2.pdf
Files
(4.6 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:0504c5c1941138d86cffaf93d7020b5f
|
536.9 MB | Download |
|
md5:9b8873e36514fc1511e35d9dd10c69d6
|
536.9 MB | Download |
|
md5:1e34644c3e210f3b320e6a147251a626
|
536.9 MB | Download |
|
md5:fa783cc524e2160e1f21db75f691f490
|
536.9 MB | Download |
|
md5:cbc7c094841029992706c9b961e912e0
|
536.9 MB | Download |
|
md5:71c933801ae58f003b01f6a9eb3a6617
|
536.9 MB | Download |
|
md5:fe63ba987d7238cbb9a43d9e75b13f54
|
536.9 MB | Download |
|
md5:0c4b920a02e1731cd8d967361dcb2ba7
|
536.9 MB | Download |
|
md5:74c079eb7a296b4a7974ab342faf0dc0
|
344.7 MB | Preview Download |
|
md5:154227c0241baf9a9ab4ee88df58475e
|
310.9 kB | Preview Download |