Published February 28, 2021 | Version 1.2.0
Dataset Open

TAU-NIGENS Spatial Sound Events 2021

  • 1. Tampere University

Description

DESCRIPTION:

The TAU-NIGENS Spatial Sound Events 2021 dataset contains multiple spatial sound-scene recordings, consisting of sound events of distinct categories integrated into a variety of acoustical spaces, and from multiple source directions and distances as seen from the recording position. The spatialization of all sound events is based on filtering through real spatial room impulse responses (RIRs), captured in multiple rooms of various shapes, sizes, and acoustical absorption properties. Furthermore, each scene recording is delivered in two spatial recording formats, a microphone array one (MIC), and first-order Ambisonics one (FOA). The sound events are spatialized as either stationary sound sources in the room, or moving sound sources, in which case time-variant RIRs are used. Each sound event in the sound scene is associated with a single direction-of-arrival (DoA) if static, a trajectory DoAs if moving, and a temporal onset and offset time. The isolated sound event recordings used for the synthesis of the sound scenes are obtained from the NIGENS general sound events database. These recordings serve as the development dataset for the DCASE 2021 Sound Event Localization and Detection Task of the DCASE 2021 Challenge.

This dataset is the third iteration of spatialized sound event datasets based on real room responses and ambient noise from multiple spaces, with each iteration introducing more challenging conditions closer to real-life. Those iterations, including the present one, are:

  • TAU Spatial Sound Events 2019, development and evaluation datasets.
    5 rooms, high direct-to-reverberant ratios (DRR), static sources only, minimum DoA separation 10°, discrete grid of DoAs, high SNR for ambient noise, maximum polyphony of 2 simultaneous events
  • TAU-NIGENS Spatial Sound Events 2020, development and evaluations datasets.
    13 rooms, low-to-high DRRs, static and moving sources, continuous DoAs, low-to-high SNR for ambient noise,
    maximum polyphony of 2 simultaneous events
  • TAU-NIGENS Spatial Sound Events 2021.
    Same as 2020, with the following exceptions: a more natural temporal distribution of sound events,
    maximum polyphony of 3 target events, inclusion of additional out-of-target-classes directional interference events

The inclusion of directional interferences is the main new challenging property of the new dataset. They are spatialized in the scene in the same way as the target events, and can be either static or moving. The interfering events are sourced from the "engine", "fire", and "general" classes of the NIGENS sound event database. The interferers are considered unknown and no activity or directional labels of them are provided with the training datasets.

REPORT & REFERENCE:

If you use this dataset please cite the report on its creation, and the corresponding DCASE2020 task setup:

Archontis Politis, Sharath Adavanne, Daniel Krause, Antoine Deleforge, Prerak Srivastava, Tuomas Virtanen (2021).
A Dataset of Dynamic Reverberant Sound Scenes with Directional Interferers for Sound Event Localization and Detection. 
In Proceedings of the Detection and Classification of Acoustic Scenes and Events 2020 Workshop (DCASE2021), Barcelona, Spain.

available here.

AIM:

The dataset includes a large number of mixtures of sound events with realistic spatial properties under different acoustic conditions, and hence it is suitable for training and evaluation of machine-listening models for sound event detection (SED), general sound source localization with diverse sounds or signal-of-interest localization, and joint sound-event-localization-and-detection (SELD). Additionally, the dataset can be used for evaluation of signal processing methods that do not necessarily rely on training, such as acoustic source localization methods and multiple-source acoustic tracking. The dataset allows evaluation of the performance and robustness of the aforementioned applications for diverse types of sounds, and under diverse acoustic conditions.

SPECIFICATIONS:

  • 600 one-minute long sound scene recordings with metadata (development dataset).
  • 200 one-minute long sound scene recordings without metadata (evaluation dataset).
  • Sampling rate 24kHz.
  • About 500 sound event samples distributed over the 12 target classes (see [here](http://doi.org/10.5281/zenodo.2535878) for more details).
  • About 400 sound event samples used as interference events (see [here](http://doi.org/10.5281/zenodo.2535878) for more details).
  • Two 4-channel 3-dimensional recording formats: first-order Ambisonics (FOA) and tetrahedral microphone array.
  • Realistic spatialization and reverberation through multichannel RIRs collected in 13 different enclosures.
  • From 1184 to 6480 possible RIR positions across the different rooms.
  • Both static reverberant and moving reverberant sound events.
  • Three possible angular speeds for moving sources of approximately 10, 20, or 40deg/sec.
  • Up to three overlapping sound events possible, temporally and spatially.
  • Simultaneous directional interfering sound events with their own temporal activities, static or moving.
  • Realistic spatial ambient noise collected from each room is added to the spatialized sound events, at varying signal-to-noise ratios (SNR) ranging from noiseless (30dB) to noisy (6dB) conditions.

The IRs were collected in Finland by staff of Tampere University between 12/2017 - 06/2018, and between 11/2019 - 1/2020. The data collection received funding from the European Research Council, grant agreement 637422 EVERYSOUND.

More detailed information on the dataset can be found in the included README file.

EXAMPLE APPLICATION:

An implementation of a trainable model of a convolutional recurrent neural network, performing joint SELD, trained and evaluated with this dataset will be provided soon. That implementation will serve as the baseline method in the DCASE 2021 Sound Event Localization and Detection Task.

DEVELOPMENT AND EVALUATION:

The current and final version (Version 1.2) of the dataset includes the 600 development audio recordings and labels, used by the participants of Task 3 of DCASE2021 Challenge to train and validate their submitted systems, and the 200 evaluation audio recordings including their labels, used in the evaluation phase of DCASE2021.

If researchers wish to compare their system against the submissions of DCASE2021 Challenge, they will have directly comparable results if they use the evaluation data as their testing set.

DOWNLOAD INSTRUCTIONS:

The three files, foa_dev.z01, and foa_dev.zip, correspond to audio data of the FOA recording format.
The three files, mic_dev.z01, and mic_dev.zip, correspond to audio data of the MIC recording format.
The metadata_dev.zip is the common metadata for both formats.

The file, foa_eval.zip, corresponds to audio data of the FOA recording format for the evaluation dataset.
The file, mic_eval.zip, corresponds to audio data of the MIC recording format for the evaluation dataset.
The metadata_eval.zip is the common metadata for both formats.

Download the zip files corresponding to the format of interest and use your favorite compression tool to unzip these split zip files. To extract a split zip archive (named as zip, z01, z02, ...), you could use, for example, the following syntax in Linux or OSX terminal:

  1. Combine the split archive to a single archive:
    zip -s 0 split.zip --out single.zip
  2. Extract the single archive using unzip:
    unzip single.zip

Files

foa_dev.zip

Files (15.2 GB)

Name Size Download all
md5:270a94dc5cd183ea6532c5a3f6e9036c
4.3 GB Download
md5:80648b5f64b1b4a824084560f1334f54
1.4 GB Preview Download
md5:591f8d2b500a671ae34822b4ff1e0889
1.9 GB Preview Download
md5:cd8cd8b4dc9a3e3df91ac55c1ccf73b7
1.9 MB Preview Download
md5:11c021253c8b55fd74083bd0a35c2ee4
660.4 kB Preview Download
md5:536a5ba37b0c39f54044932b75acb774
4.3 GB Download
md5:a5131297547431160b732a3481626c2d
1.4 GB Preview Download
md5:3248aef229ab4e0e0603d7c2269f4f97
1.9 GB Preview Download
md5:73332d04f05ed6568210731fb9296593
22.7 kB Preview Download

Additional details

Funding

EVERYSOUND – Computational Analysis of Everyday Soundscapes 637422
European Commission

References

  • Archontis Politis, Annamaria Mesaros, Sharath Adavanne, Toni Heittola, and Tuomas Virtanen (2020). Overview and Evaluation of Sound Event Localization and Detection in DCASE 2019. IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.29, pp.684 - 698.
  • Archontis Politis, Sharath Adavanne, and Tuomas Virtanen (2020). A Dataset of Reverberant Spatial Sound Scenes with Moving Sources for Sound Event Localization and Detection. In Proceedings of the Detection and Classification of Acoustic Scenes and Events 2020 Workshop (DCASE2020), Tokyo, Japan.
  • Sharath Adavanne, Archontis Politis, and Tuomas Virtanen (2019). A Multi-room reverberant dataset for sound event localization and detection. Proceedings of the Detection and Classification of Acoustic Scenes and Events 2019 Workshop (DCASE2019), New York, NY, USA.
  • Ivo Trowitzsch, Jalil Taghia, Youssef Kashef, and Klaus Obermayer (2019). The NIGENS general sound events database. Technische Universität Berlin, Tech. Rep. arXiv:1902.08314 [cs.SD]