There is a newer version of the record available.

Published May 31, 2020 | Version 1.1.0
Dataset Open

TAU-NIGENS Spatial Sound Events 2020

  • 1. Tampere University



The TAU-NIGENS Spatial Sound Events 2020 dataset contains multiple spatial sound-scene recordings, consisting of sound events of distinct categories integrated into a variety of acoustical spaces, and from multiple source directions and distances as seen from the recording position. The spatialization of all sound events is based on filtering through real spatial room impulse responses (RIRs), captured in multiple rooms of various shapes, sizes, and acoustical absorption properties. Furthermore, each scene recording is delivered in two spatial recording formats, a microphone array one (MIC), and first-order Ambisonics one (FOA). The sound events are spatialized as either stationary sound sources in the room, or moving sound sources, in which case time-variant RIRs are used. Each sound event in the sound scene is associated with a trajectory of its direction-of-arrival (DoA) to the recording point, and a temporal onset and offset time. The isolated sound event recordings used for the synthesis of the sound scenes are obtained from the NIGENS general sound events database. These recordings serve as the development dataset for the DCASE 2020 Sound Event Localization and Detection Task of the DCASE 2020 Challenge.


The dataset includes a large number of mixtures of sound events with realistic spatial properties under different acoustic conditions, and hence it is suitable for training and evaluation of machine-listening models for sound event detection (SED), general sound source localization with diverse sounds or signal-of-interest localization, and joint sound-event-localization-and-detection (SELD). Additionally, the dataset can be used for evaluation of signal processing methods that do not necessarily rely on training, such as acoustic source localization methods and multiple-source acoustic tracking. The dataset allows evaluation of the performance and robustness of the aforementioned applications for diverse types of sounds, and under diverse acoustic conditions.


  • 600 one-minute long sound scene recordings (development dataset).
  • 200 one-minute long sound scene recordings (evaluation dataset).
  • Sampling rate 24kHz.
  • About 700 sound event samples spread over 14 classes (see here for more details).
  • 6 provided cross-validation splits of 100 recordings each, with unique sound event samples and rooms in each of them.
  • Two 4-channel 3-dimensional recording formats: first-order Ambisonics (FOA) and tetrahedral microphone array.
  • Realistic spatialization and reverberation through RIRs collected in 15 different enclosures.
  • From about 1500 to 3500 possible RIR positions across the different rooms.
  • Both static reverberant and moving reverberant sound events.
  • Up to two overlapping sound events allowed, temporally and spatially.
  • Realistic spatial ambient noise collected from each room is added to the spatialized sound events, at varying signal-to-noise ratios (SNR) ranging from noiseless (30dB) to noisy (6dB).

The IRs were collected in Finland by staff of Tampere University between 12/2017 - 06/2018, and between 11/2019 - 1/2020. The older measurements from five rooms were also used for the earlier development and evaluation datasets TAU Spatial Sound Events 2019, while ten additional rooms were added for this dataset. The data collection received funding from the European Research Council, grant agreement 637422 EVERYSOUND.

More detailed information on the dataset can be found in the included README file.


An implementation of a trainable model of a convolutional recurrent neural network, performing joint SELD, trained and evaluated with this dataset is provided here. This implementation serves as the baseline method in the DCASE 2020 Sound Event Localization and Detection Task.


The three files, foa_dev.z01, foa_dev.z02, and, correspond to audio data of the FOA recording format.
The three files, mic_dev.z01, mic_dev.z02, and, correspond to audio data of the MIC recording format.
The is the common metadata for both formats.

The file,, corresponds to audio data of the FOA recording format for the evaluation dataset.
The file,, corresponds to audio data of the MIC recording format for the evaluation dataset.

Download the zip files corresponding to the format of interest and use your favorite compression tool to unzip these split zip files. To extract a split zip archive (named as zip, z01, z02, ...), you could use, for example, the following syntax in Linux or OSX terminal:

  1. Combine the split archive to a single archive:
    zip -s 0 --out
  2. Extract the single archive using unzip:


Files (14.0 GB)

Name Size Download all
2.1 GB Download
2.1 GB Download
1.0 GB Preview Download
1.7 GB Preview Download
1.2 MB Preview Download
2.1 GB Download
2.1 GB Download
936.6 MB Preview Download
1.7 GB Preview Download
17.7 kB Preview Download

Additional details

Related works


EVERYSOUND – Computational Analysis of Everyday Soundscapes 637422
European Commission


  • Sharath Adavanne, Archontis Politis, and Tuomas Virtanen (2019). A Multi-room reverberant dataset for sound event localization and detection. Proceedings of the Detection and Classification of Acoustic Scenes and Events 2019 Workshop (DCASE2019)
  • Ivo Trowitzsch, Jalil Taghia, Youssef Kashef, and Klaus Obermayer (2019). The NIGENS general sound events database. Technische Universität Berlin, Tech. Rep. arXiv:1902.08314 [cs.SD]