TUT Sound Events 2018 - Ambisonic, Reverberant and Synthetic Impulse Response Dataset

doi:10.5281/zenodo.1237707

Published April 30, 2018 | Version v1

Dataset Open

TUT Sound Events 2018 - Ambisonic, Reverberant and Synthetic Impulse Response Dataset

1. Tampere University of Technology, Finland
2. Aalto University, Finland

Tampere University of Technology (TUT) Sound Events 2018 - Ambisonic, Reverberant and Synthetic Impulse Response Dataset

This dataset consists of simulated reverberant first order Ambisonic (FOA) format recordings with stationary point sources each associated with a spatial coordinate. The dataset consists of three sub-datasets with a) maximum one temporally overlapping sound events, b) maximum two temporally overlapping sound events, and c) maximum three temporally overlapping sound events. Each of the sub-datasets has three cross-validation splits, that consists of 240 recordings of about 30 seconds long for training split and 60 recordings of the same length for the testing split. For each recording, the metadata file with the same name consists of the sound event name, the temporal onset and offset time (in seconds), spatial location in azimuth and elevation angles (in degrees), and distance from the microphone (in meters). The sound events are spatially placed within a room using the image source method. The room size chosen was 10x8x4 meter with reverberation time per octave band of [1.0, 0.8, 0.7, 0.6, 0.5, 0.4] s and 125 Hz–4 kHz band center frequencies.

The isolated sound events were taken from the DCASE 2016 task 2 dataset. This dataset consists of 11 sound event classes such as Clearing throat, Coughing, Door knock, Door slam, Drawer, Human laughter, Keyboard, Keys (put on a table), Page turning, Phone ringing and Speech. The sound events are randomly placed in a spatial grid with 10-degree resolution in full azimuth and [-60 60) degree elevation angles. Additionally, the sound events are placed at a random distance of at least 1 meter away from the microphone.

The license of the dataset can be found in the LICENSE file. The rest of the nine zip files consists of datasets for a given split and overlap. For example, the ov3_split1.zip file consists of the audio and metadata folders for the case of maximum three temporally overlapping sound events (ov3) and the first cross-validation split (split1). Within each audio/metadata folder, the filenames for training split have the 'train' prefix, while the testing split filenames have the 'test' prefix.

This dataset was collected as part of the 'Sound event localization and detection of overlapping sources using convolutional recurrent neural network' work.

Files

ov1_split1.zip

Files (21.5 GB)

Name	Size	Download all
LICENSE md5:a2075692e0ff3e3850f0b0e28dc76126	1.7 kB	Download
ov1_split1.zip md5:1208be67e97f68565ef51e5c6922c655	2.1 GB	Preview Download
ov1_split2.zip md5:fe8f9e7071cd4105170cfc02f18df4ad	2.1 GB	Preview Download
ov1_split3.zip md5:b7eab263c765d88342ac1a1b6f4fc33e	2.1 GB	Preview Download
ov2_split1.zip md5:841ec5da40038f2ec5c2e65bc5e68f9d	2.5 GB	Preview Download
ov2_split2.zip md5:13f44c17d0c63e65fd8e4fd47dabbbff	2.5 GB	Preview Download
ov2_split3.zip md5:56b310b5b61ac1b8e95f9dadfe152b21	2.5 GB	Preview Download
ov3_split1.zip md5:04caad52a526de0a7817ecd6906f5569	2.6 GB	Preview Download
ov3_split2.zip md5:e4e157dae12699cfc1a5236b3804d800	2.6 GB	Preview Download
ov3_split3.zip md5:17568dfc2044e48a7e8a7eac107200f6	2.6 GB	Preview Download

Additional details

EVERYSOUND – Computational Analysis of Everyday Soundscapes 637422: European Commission

	All versions	This version
Views	6,435	6,421
Downloads	1,127	1,104
Data volume	13.1 TB	13.0 TB

TUT Sound Events 2018 - Ambisonic, Reverberant and Synthetic Impulse Response Dataset

Creators

Description

Files

ov1_split1.zip

Files (21.5 GB)

Additional details

Funding