Dataset Open Access

# Free Universal Sound Separation Dataset

Scott Wisdom; Hakan Erdogan; Dan Ellis; John R. Hershey

##### Researcher(s)
Romain Serizel; Nicolas Turpault; Justin Salamon; Prem Seetharaman; Eduardo Fonesca; Frederic Font Corbera

The Free Universal Sound Separation (FUSS) Dataset is a database of arbitrary sound mixtures and source-level references, for use in experiments on arbitrary sound separation.

This is the official sound separation data for the DCASE2020 Challenge Task 4: Sound Event Detection and Separation in Domestic Environments.

Citation: If you use the FUSS dataset or part of it, please cite our paper describing the dataset and baseline [1].  FUSS is based on FSD data so please also cite [2]:

Overview: FUSS audio data is sourced from a pre-release of Freesound dataset known as (FSD50k), a sound event dataset composed of Freesound content annotated with labels from the AudioSet Ontology. Using the FSD50K labels, these source files have been screened such that they likely only contain a single type of sound. Labels are not provided for these source files, and are not considered part of the challenge. For the purpose of the DCASE Task4 Sound Separation and Event Detection challenge,  systems should not use FSD50K labels, even though they may become available upon FSD50K release.

To create mixtures, 10 second clips of sources are convolved with simulated room impulse responses and added together. Each 10 second mixture contains between 1 and 4 sources. Source files longer than 10 seconds are considered "background" sources. Every mixture contains one background source, which is active for the entire duration. We provide: a software recipe to create the dataset, the room impulse responses, and the original source audio.

Motivation for use in DCASE2020 Challenge Task 4:  This dataset provides a platform to investigate how source separation may help with event detection and vice versa.  Previous work has shown that universal sound separation (separation of arbitrary sounds) is possible [3], and that event detection can help with universal sound separation [4].  It remains to be seen whether sound separation can help with event detection. Event detection is more difficult in noisy environments, and so separation could be a useful pre-processing step. Data with strong labels for event detection are relatively scarce, especially when restricted to specific classes within a domain. In contrast, source separation data needs no event labels for training, and may be more plentiful. In this setting, the idea  is to utilize larger unlabeled separation data to train separation systems, which can serve as a front-end to event-detection systems trained on more limited data.

Room simulation: Room impulse responses are simulated using the image method with frequency-dependent walls. Each impulse corresponds to a rectangular room of random size with random wall materials, where a single microphone and up to 4 sources are placed at random spatial locations.

Recipe for data creation: The data creation recipe starts with scripts, based on scaper [5], to generate mixtures of events with random timing of source events, along with a background source that spans the duration of the mixture clip.  The scipts for this are at this GitHub repo.

The data are reverberated using a different room simulation for each mixture. In this simulation each source has its own reverberation corresponding to a different spatial location. The reverberated mixtures are created by summing over the reverberated sources. The dataset recipe scripts support modification, so that participants may remix and augment the training data as desired.

The constituent source files for each mixture are also generated for use as references for training and evaluation.   The dataset recipe scripts support modification, so that participants may remix and augment the training data as desired.

Note: no attempt was made to remove digital silence from the freesound source data, so some reference sources may include digital silence, and there are a few mixtures where the background reference is all digital silence.   Digital silence can also be observed in the event recognition public evaluation data, so it is important to be able to handle this in practice.   Our evaluation scripts handle it by ignoring any reference sources that are silent.

Format:  All audio clips are provided as uncompressed PCM 16 bit, 16 kHz, mono audio files.

Data split:  The FUSS dataset is partitioned into "train", "validation", and "eval" sets, following the same splits used in FSD data. Specifically, the train and validation sets are sourced from the FSD50K dev set, and we have ensured that clips in train come from different uploaders than the clips in validation. The eval set is sourced from the FSD50K eval split.

Baseline System:  A baseline system for the FUSS dataset is available at  dcase2020_fuss_baseline.

License:  All audio clips (i.e., in  FUSS_fsd_data.tar.gz) used in the preparation of Free Universal Source Separation (FUSS) dataset are designated Creative Commons (CC0) and were obtained from freesound.org.  The source data in FUSS_fsd_data.tar.gz were selected using labels from the FSD50K corpus, which is licensed as Creative Commons Attribution 4.0 International (CC BY 4.0) License.

The FUSS dataset as a whole, is a curated, reverberated, mixed, and partitioned preparation, and is released under the Creative Commons Attribution 4.0 International (CC BY 4.0) License. This license is specified in the LICENSE-DATASET file downloaded with the FUSS_license_doc.tar.gz file.

Notes:

• FUSS_baseline_dry_model.tar.gz: baseline separation model trained on non-reverberated (dry) data.
• FUSS_DESED_baseline_dry_2_model.tar.gz:: baseline separation model for the DESED task, trained on a mixture of DESED in-domain data and FUSS data

• FUSS_DESED_baseline_dry_1_model.tar.gz: baseline separation model for the DESED task, trained to separate DESED mixtures from dry FUSS mixtures (DmFm)
• FUSS_DESED_baseline_dry_4_model.tar.gz: baseline separation model for the DESED task, trained to separate DESED background, dry FUSS mixture, and 5 DESED foreground sources with PIT (PIT)
• FUSS_DESED_baseline_dry_4np_model.tar.gz: baseline separation model for the DESED task, trained to separate DESED background, 10 DESED classes, and dry FUSS mixture without PIT (Classwise)
• FUSS_DESED_baseline_dry_6_model.tar.gz: baseline separation model for the DESED task, trained to separate DESED background, 5 DESED foreground sources, 4 dry FUSS sources, with groupwise PIT (GroupPIT)

The names in parentheses are the task names from Table 3 of the following paper:  Nicolas Turpault,  Scott Wisdom, Hakan Erdogan, John R. Hershey, Romain Serizel, Eduardo Fonseca, Prem Seetharaman, and Justin Salamon, "Improving Sound Event Detection in Domestic Environments using Sound Separation",   DCASE 2020.

Files (25.2 GB)
Name Size
FUSS_baseline_dry_model.tar.gz
md5:aca2b32125659a420bd5c307e2505e43
102.3 MB
FUSS_baseline_model.tar.gz
md5:fb803bc4b0253912bfe823d4df508a6b
102.3 MB
FUSS_DESED_baseline_dry_1_model.tar.gz
md5:17ca2bb89949a117579f5c34afbe443c
99.9 MB
FUSS_DESED_baseline_dry_2_model.tar.gz
md5:00174838e1454a9692eebe21e1bce616
101.0 MB
FUSS_DESED_baseline_dry_4_model.tar.gz
md5:a83120258e77955e1a609765483c69b2
103.9 MB
FUSS_DESED_baseline_dry_4np_model.tar.gz
md5:aa0282b99604040e65613d8047ba98e8
107.0 MB
FUSS_DESED_baseline_dry_6_model.tar.gz
md5:7b2252c2f07207edaf37d00739390180
106.1 MB
FUSS_fsd_data.tar.gz
md5:79742c73e0d4164f35207e6e74487098
1.9 GB
md5:7d00e8b03601698413045b7293c04fc2
4.0 kB
FUSS_rir_data.tar.gz
md5:e43c7d6d29332eae4bb193f7a940b194
5.8 GB
FUSS_ssdata.tar.gz
md5:d1f35069880185056ab1bde5aa1da1e7
8.9 GB
FUSS_ssdata_reverb.tar.gz
7.9 GB
• [1] Scott Wisdom, Hakan Erdogan, Daniel P. W. Ellis, Romain Serizel, Nicolas Turpault, Eduardo Fonseca, Justin Salamon, Prem Seetharaman, John R. Hershey, "What's All the FUSS About Free Universal Sound Separation Data?", 2020, in preparation.

• [2] Eduardo Fonseca, Jordi Pons, Xavier Favory, Frederic Font Corbera, Dmitry Bogdanov, Andrés Ferraro, Sergio Oramas, Alastair Porter, and Xavier Serra. "Freesound Datasets: A Platform for the Creation of Open Audio Datasets." International Society for Music Information Retrieval Conference (ISMIR), pp. 486–493. Suzhou, China, 2017.

• [3] Ilya Kavalerov, Scott Wisdom, Hakan Erdogan, Brian Patton, Kevin Wilson, Jonathan Le Roux, and John R. Hershey. "Universal Sound Separation." IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pp. 175-179. New Paltz, NY, USA, 2019.

• [4] Efthymios Tzinis, Scott Wisdom, John R. Hershey, Aren Jansen, and Daniel P. W. Ellis. "Improving Universal Sound Separation Using Sound Classification." IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), 2020.

• [5] J. Salamon, D. MacConnell, M. Cartwright, P. Li, and J. P. Bello., "Scaper: A Library for Soundscape Synthesis and Augmentation", In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, NY, USA, Oct. 2017.

3,571
20,746
views