Published April 5, 2024 | Version v1
Dataset Open

[DCASE2024 Task 3] Synthetic SELD mixtures for baseline training

Description

DESCRIPTION:

This audio dataset serves serves as supplementary material for the DCASE2024 Challenge Task 3: Audio and Audiovisual Sound Event Localization and Detection with Distance Estimation. The dataset consists of synthetic spatial audio mixtures of sound events spatialized for two different spatial formats using real measured room impulse responses (RIRs) measured in various spaces of Tampere University (TAU). The mixtures are generated using the same process as the one used to generate the recordings of the TAU-NIGENS Spatial Sound Scenes 2021 dataset for the DCASE2021 Challenge Task 3

The SELD task setup in DCASE2024 is based on spatial recordings of real scenes, captured in the STARS23 dataset. Since the task setup allows use of external data, these synthetic mixtures serve as additional training material for the baseline model. For more details on the task setup, please refer to the task description.

Note that the generator code and the collection of room responses used to spatialize sound samples will be also be made available soon. For more details on the recording of RIRs, spatialization, and generation, see:

  • Archontis Politis, Sharath Adavanne, Daniel Krause, Antoine Deleforge, Prerak Srivastava, Tuomas Virtanen (2021). A Dataset of Dynamic Reverberant Sound Scenes with Directional Interferers for Sound Event Localization and Detection. In Proceedings of the Detection and Classification of Acoustic Scenes and Events 2020 Workshop (DCASE2021), Barcelona, Spain.

available here.

SPECIFICATIONS:

  • 13 target sound classes (see task description for details)
  • The sound event samples are sources from the FSD50K dataset, based on affinity of the labels in that dataset to the target classes. The selection on distinguishing which labels in FSD50K corresponded to the target ones, then selecting samples that were tagged with only those labels, and additionally that they had annotator rating of Present and Predominant (see FSD50K for more details). The list of the selected files is included here.
  • 1200 1-minute long spatial recordings
  • Sampling rate of 24kHz
  • Two 4-channel recording formats, first-order Ambisonics (FOA) and tetrahedral microphone array (MIC)
  • Spatial events spatialized in 9 unique rooms, using measured RIRs for the two formats
  • Maximum polyphony of 3 (with possible same-class events overlapping)
  • Even though the whole set is used for training of the baseline without distinction between the mixtures, we have included a separation into a training and testing split, in case on one needs to test the performance purely on those synthetic conditions (for example for comparisons with training on mixed synthetic-real data, fine-tuning on real data, or training on real data only).
  • The training split is indicated as fold1 in the dataset, contains 900 recordings spatialized on 6 rooms (150 recordings/room) and it is based on samples from the development set of FSD50K.
  • The testing split is indicated as fold2 in the dataset, contains 300 recordings spatialized on 3 rooms (100 recordings/room) and it is based on samples from the evaluation set of FSD50K.
  • Common metadata files for both formats are provided. For the file naming and the metadata format, refer to the task setup.

 

DOWNLOAD INSTRUCTIONS:

Download the zip files and use your preferred compression tool to unzip these split zip files. To extract a split zip archive (named as zip, z01, z02, ...), you could use, for example, the following syntax in Linux or OSX terminal:

  1. Combine the split archive to a single archive:
    zip -s 0 split.zip --out single.zip
  2. Extract the single archive using unzip:
    unzip single.zip

Files

Files (18.9 GB)

Name Size Download all
md5:c651f4a9670326097361763015d955f1
4.7 GB Download
md5:3199250c0c7fa718888f74a7d3325400
4.7 GB Download
md5:203df900f0275d8c34f04fa508b60a85
4.7 GB Download
md5:12306d0d398327847734896eae3986c1
4.7 GB Download
md5:d1e53c4c21dab44a1a6a6c150bc90e3f
140.1 MB Download

Additional details

Related works

Is new version of
Dataset: https://zenodo.org/records/6406873 (Other)
Requires
Dataset: https://doi.org/10.5281/zenodo.6387880 (Other)

References

  • Archontis Politis, Sharath Adavanne, Daniel Krause, Antoine Deleforge, Prerak Srivastava, Tuomas Virtanen (2021). A Dataset of Dynamic Reverberant Sound Scenes with Directional Interferers for Sound Event Localization and Detection. In Proceedings of the Detection and Classification of Acoustic Scenes and Events 2020 Workshop (DCASE2021), Barcelona, Spain.