Free Universal Sound Separation Dataset

Scott Wisdom; Hakan Erdogan; Dan Ellis; John R. Hershey

doi:10.5281/zenodo.4012661

Published March 4, 2020 | Version 1.3

Dataset Open

Free Universal Sound Separation Dataset

1. Google Research

Contributors

Researchers:

1. LORIA
2. INRIA
3. Adobe Research
4. Northwestern University
5. Universitat Pompeu Fabra (UPF)

The Free Universal Sound Separation (FUSS) Dataset is a database of arbitrary sound mixtures and source-level references, for use in experiments on arbitrary sound separation.

This is the official sound separation data for the DCASE2020 Challenge Task 4: Sound Event Detection and Separation in Domestic Environments.

Citation: If you use the FUSS dataset or part of it, please cite our paper describing the dataset and baseline [1]. FUSS is based on FSD data so please also cite [2]:

Overview: FUSS audio data is sourced from a pre-release of Freesound dataset known as (FSD50k), a sound event dataset composed of Freesound content annotated with labels from the AudioSet Ontology. Using the FSD50K labels, these source files have been screened such that they likely only contain a single type of sound. Labels are not provided for these source files, and are not considered part of the challenge. For the purpose of the DCASE Task4 Sound Separation and Event Detection challenge, systems should not use FSD50K labels, even though they may become available upon FSD50K release.

To create mixtures, 10 second clips of sources are convolved with simulated room impulse responses and added together. Each 10 second mixture contains between 1 and 4 sources. Source files longer than 10 seconds are considered "background" sources. Every mixture contains one background source, which is active for the entire duration. We provide: a software recipe to create the dataset, the room impulse responses, and the original source audio.

Motivation for use in DCASE2020 Challenge Task 4: This dataset provides a platform to investigate how source separation may help with event detection and vice versa. Previous work has shown that universal sound separation (separation of arbitrary sounds) is possible [3], and that event detection can help with universal sound separation [4]. It remains to be seen whether sound separation can help with event detection. Event detection is more difficult in noisy environments, and so separation could be a useful pre-processing step. Data with strong labels for event detection are relatively scarce, especially when restricted to specific classes within a domain. In contrast, source separation data needs no event labels for training, and may be more plentiful. In this setting, the idea is to utilize larger unlabeled separation data to train separation systems, which can serve as a front-end to event-detection systems trained on more limited data.

Room simulation: Room impulse responses are simulated using the image method with frequency-dependent walls. Each impulse corresponds to a rectangular room of random size with random wall materials, where a single microphone and up to 4 sources are placed at random spatial locations.

Recipe for data creation: The data creation recipe starts with scripts, based on scaper [5], to generate mixtures of events with random timing of source events, along with a background source that spans the duration of the mixture clip. The scipts for this are at this GitHub repo.

The data are reverberated using a different room simulation for each mixture. In this simulation each source has its own reverberation corresponding to a different spatial location. The reverberated mixtures are created by summing over the reverberated sources. The dataset recipe scripts support modification, so that participants may remix and augment the training data as desired.

The constituent source files for each mixture are also generated for use as references for training and evaluation. The dataset recipe scripts support modification, so that participants may remix and augment the training data as desired.

Note: no attempt was made to remove digital silence from the freesound source data, so some reference sources may include digital silence, and there are a few mixtures where the background reference is all digital silence. Digital silence can also be observed in the event recognition public evaluation data, so it is important to be able to handle this in practice. Our evaluation scripts handle it by ignoring any reference sources that are silent.

Format: All audio clips are provided as uncompressed PCM 16 bit, 16 kHz, mono audio files.

Data split: The FUSS dataset is partitioned into "train", "validation", and "eval" sets, following the same splits used in FSD data. Specifically, the train and validation sets are sourced from the FSD50K dev set, and we have ensured that clips in train come from different uploaders than the clips in validation. The eval set is sourced from the FSD50K eval split.

Baseline System: A baseline system for the FUSS dataset is available at dcase2020_fuss_baseline.

License: All audio clips (i.e., in FUSS_fsd_data.tar.gz) used in the preparation of Free Universal Source Separation (FUSS) dataset are designated Creative Commons (CC0) and were obtained from freesound.org. The source data in FUSS_fsd_data.tar.gz were selected using labels from the FSD50K corpus, which is licensed as Creative Commons Attribution 4.0 International (CC BY 4.0) License.

The FUSS dataset as a whole, is a curated, reverberated, mixed, and partitioned preparation, and is released under the Creative Commons Attribution 4.0 International (CC BY 4.0) License. This license is specified in the `LICENSE-DATASET` file downloaded with the `FUSS_license_doc.tar.gz` file.

Notes:

Added in v1.2:

FUSS_baseline_dry_model.tar.gz: baseline separation model trained on non-reverberated (dry) data.
FUSS_DESED_baseline_dry_2_model.tar.gz:: baseline separation model for the DESED task, trained on a mixture of DESED in-domain data and FUSS data

Added in v1.3:

FUSS_DESED_baseline_dry_1_model.tar.gz: baseline separation model for the DESED task, trained to separate DESED mixtures from dry FUSS mixtures (DmFm)
FUSS_DESED_baseline_dry_4_model.tar.gz: baseline separation model for the DESED task, trained to separate DESED background, dry FUSS mixture, and 5 DESED foreground sources with PIT (PIT)
FUSS_DESED_baseline_dry_4np_model.tar.gz: baseline separation model for the DESED task, trained to separate DESED background, 10 DESED classes, and dry FUSS mixture without PIT (Classwise)
FUSS_DESED_baseline_dry_6_model.tar.gz: baseline separation model for the DESED task, trained to separate DESED background, 5 DESED foreground sources, 4 dry FUSS sources, with groupwise PIT (GroupPIT)

The names in parentheses are the task names from Table 3 of the following paper: Nicolas Turpault, Scott Wisdom, Hakan Erdogan, John R. Hershey, Romain Serizel, Eduardo Fonseca, Prem Seetharaman, and Justin Salamon, "Improving Sound Event Detection in Domestic Environments using Sound Separation", DCASE 2020.

Files

Files (25.2 GB)

Name	Size	Download all
FUSS_baseline_dry_model.tar.gz md5:aca2b32125659a420bd5c307e2505e43	102.3 MB	Download
FUSS_baseline_model.tar.gz md5:fb803bc4b0253912bfe823d4df508a6b	102.3 MB	Download
FUSS_DESED_baseline_dry_1_model.tar.gz md5:17ca2bb89949a117579f5c34afbe443c	99.9 MB	Download
FUSS_DESED_baseline_dry_2_model.tar.gz md5:00174838e1454a9692eebe21e1bce616	101.0 MB	Download
FUSS_DESED_baseline_dry_4_model.tar.gz md5:a83120258e77955e1a609765483c69b2	103.9 MB	Download
FUSS_DESED_baseline_dry_4np_model.tar.gz md5:aa0282b99604040e65613d8047ba98e8	107.0 MB	Download
FUSS_DESED_baseline_dry_6_model.tar.gz md5:7b2252c2f07207edaf37d00739390180	106.1 MB	Download
FUSS_fsd_data.tar.gz md5:79742c73e0d4164f35207e6e74487098	1.9 GB	Download
FUSS_license_doc.tar.gz md5:7d00e8b03601698413045b7293c04fc2	4.0 kB	Download
FUSS_rir_data.tar.gz md5:e43c7d6d29332eae4bb193f7a940b194	5.8 GB	Download
FUSS_ssdata.tar.gz md5:d1f35069880185056ab1bde5aa1da1e7	8.9 GB	Download
FUSS_ssdata_reverb.tar.gz md5:0ee3f1229194b5b7ab2cdf8a12add0b6	7.9 GB	Download

Additional details

[1] Scott Wisdom, Hakan Erdogan, Daniel P. W. Ellis, Romain Serizel, Nicolas Turpault, Eduardo Fonseca, Justin Salamon, Prem Seetharaman, John R. Hershey, "What's All the FUSS About Free Universal Sound Separation Data?", 2020, in preparation.
[2] Eduardo Fonseca, Jordi Pons, Xavier Favory, Frederic Font Corbera, Dmitry Bogdanov, Andrés Ferraro, Sergio Oramas, Alastair Porter, and Xavier Serra. "Freesound Datasets: A Platform for the Creation of Open Audio Datasets." International Society for Music Information Retrieval Conference (ISMIR), pp. 486–493. Suzhou, China, 2017.
[3] Ilya Kavalerov, Scott Wisdom, Hakan Erdogan, Brian Patton, Kevin Wilson, Jonathan Le Roux, and John R. Hershey. "Universal Sound Separation." IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pp. 175-179. New Paltz, NY, USA, 2019.
[4] Efthymios Tzinis, Scott Wisdom, John R. Hershey, Aren Jansen, and Daniel P. W. Ellis. "Improving Universal Sound Separation Using Sound Classification." IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), 2020.
[5] J. Salamon, D. MacConnell, M. Cartwright, P. Li, and J. P. Bello., "Scaper: A Library for Soundscape Synthesis and Augmentation", In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, NY, USA, Oct. 2017.

	All versions	This version
Views	10,455	2,147
Downloads	12,512	2,493
Data volume	187.7 TB	10.6 TB

Free Universal Sound Separation Dataset

Creators

Contributors

Researchers:

Description

Files

Files (25.2 GB)

Additional details

References