Published September 11, 2024 | Version v1
Dataset Open

Auditory Scene Analysis dataset (Multichannel universal sound separation & polyphonic audio classification)

  • 1. ROR icon Korea Advanced Institute of Science and Technology

Description

We constructed a new dataset for multichannel universal sound separation and polyphonic audio classification tasks.

We constructed a new dataset for multichannel USS and polyphonic audio classification tasks. The proposed dataset is designed to reflect various conditions, including moving sources with temporal onsets and offsets. For foreground sound sources, signals from 13 audio classes were selected from open-source databases (Pixabay and FSD50K, Librispeech, MUSDB18, Vocalsound). These signals were resampled to 16 kHz and pre-processed by either padding zeros or cropping to 4 seconds. Each sound source has a 75% probability of being a moving source, with speeds ranging from 0 to 3 m/s. The dataset features between 2 to 4 foreground sound sources, along with one background noise from the diffused TAU-SNoise dataset with a signal-to-noise ratio (SNR) ranging from 6 to 30 dB. The simulations were conducted using gpuRIR. Room dimensions were set to a width and length between 5 and 8 meters, and a height between 3 and 4 meters, with reverberation times ranging from 0.2 to 0.6 seconds. These parameters were sampled from uniform distributions. We simulated spatialized sound sources using a 4-channel tetrahedral microphone array with a radius of 4.2 cm. The procedure for dataset generation and details about class configuration and durations of audio clips are provided in the paper. This dataset poses a significant challenge for separation tasks due to the inclusion of moving sources, onset and offset conditions, overlapped in-class sources, and noisy reverberant environments.

The procedure for dataset generation and details about class configuration and durations of audio clips are provided in the paper. This dataset poses a significant challenge for separation tasks due to the inclusion of moving sources, onset and offset conditions, overlapped in-class sources, and noisy reverberant environments.

Files

ASA_20k_4s_nspk2-4.zip

Files (34.9 GB)

Name Size Download all
md5:2608279e515791fee2568ce66d1fb438
34.9 GB Preview Download

Additional details

Dates

Available
2024-09-12