4012661
doi
10.5281/zenodo.4012661
oai:zenodo.org:4012661
user-dcase
user-sigsep
Romain Serizel
LORIA
Nicolas Turpault
INRIA
Justin Salamon
Adobe Research
Prem Seetharaman
Northwestern University
Eduardo Fonesca
Universitat Pompeu Fabra (UPF)
Frederic Font Corbera
Universitat Pompeu Fabra (UPF)
Hakan Erdogan
Google Research
Dan Ellis
Google Research
John R. Hershey
Google Research
Free Universal Sound Separation Dataset
Scott Wisdom
Google Research
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
sound separation
<p>The Free Universal Sound Separation (FUSS) Dataset is a database of arbitrary sound mixtures and source-level references, for use in experiments on arbitrary sound separation. </p>
<p>This is the official sound separation data for the DCASE2020 Challenge Task 4: Sound Event Detection and Separation in Domestic Environments.</p>
<p><strong>Citation: </strong>If you use the FUSS dataset or part of it, please cite our paper describing the dataset and baseline [1]. FUSS is based on <a href="https://annotator.freesound.org/fsd/">FSD data</a> so please also cite [2]:</p>
<p><strong>Overview: </strong>FUSS audio data is sourced from a pre-release of <a href="https://annotator.freesound.org/fsd/">Freesound dataset</a> known as (FSD50k), a sound event dataset composed of Freesound content annotated with labels from the AudioSet Ontology. Using the FSD50K labels, these source files have been screened such that they likely only contain a single type of sound. Labels are not provided for these source files, and are not considered part of the challenge. For the purpose of the DCASE Task4 Sound Separation and Event Detection challenge, systems should not use FSD50K labels, even though they may become available upon FSD50K release.</p>
<p>To create mixtures, 10 second clips of sources are convolved with simulated room impulse responses and added together. Each 10 second mixture contains between 1 and 4 sources. Source files longer than 10 seconds are considered "background" sources. Every mixture contains one background source, which is active for the entire duration. We provide: a software recipe to create the dataset, the room impulse responses, and the original source audio.</p>
<p><strong>Motivation for use in DCASE2020 Challenge Task 4: </strong> This dataset provides a platform to investigate how source separation may help with event detection and vice versa. Previous work has shown that universal sound separation (separation of arbitrary sounds) is possible [3], and that event detection can help with universal sound separation [4]. It remains to be seen whether sound separation can help with event detection. Event detection is more difficult in noisy environments, and so separation could be a useful pre-processing step. Data with strong labels for event detection are relatively scarce, especially when restricted to specific classes within a domain. In contrast, source separation data needs no event labels for training, and may be more plentiful. In this setting, the idea is to utilize larger unlabeled separation data to train separation systems, which can serve as a front-end to event-detection systems trained on more limited data.</p>
<p><strong>Room simulation: </strong>Room impulse responses are simulated using the image method with frequency-dependent walls. Each impulse corresponds to a rectangular room of random size with random wall materials, where a single microphone and up to 4 sources are placed at random spatial locations.</p>
<p><strong>Recipe for data creation: </strong>The data creation recipe starts with scripts, based on<a href="https://github.com/justinsalamon/scaper"> scaper</a> [5], to generate mixtures of events with random timing of source events, along with a background source that spans the duration of the mixture clip. The scipts for this are at<a href="https://github.com/google-research/sound-separation/tree/master/datasets/fuss"> this GitHub repo</a>.</p>
<p>The data are reverberated using a different room simulation for each mixture. In this simulation each source has its own reverberation corresponding to a different spatial location. The reverberated mixtures are created by summing over the reverberated sources. The dataset recipe scripts support modification, so that participants may remix and augment the training data as desired.</p>
<p>The constituent source files for each mixture are also generated for use as references for training and evaluation. The dataset recipe scripts support modification, so that participants may remix and augment the training data as desired.</p>
<p>Note: no attempt was made to remove digital silence from the freesound source data, so some reference sources may include digital silence, and there are a few mixtures where the background reference is all digital silence. Digital silence can also be observed in the event recognition public evaluation data, so it is important to be able to handle this in practice. Our evaluation scripts handle it by ignoring any reference sources that are silent. </p>
<p><strong>Format: </strong>All audio clips are provided as uncompressed PCM 16 bit, 16 kHz, mono audio files.</p>
<p><strong>Data split: </strong> The FUSS dataset is partitioned into "train", "validation", and "eval" sets, following the same splits used in FSD data. Specifically, the train and validation sets are sourced from the FSD50K dev set, and we have ensured that clips in train come from different uploaders than the clips in validation. The eval set is sourced from the FSD50K eval split.</p>
<p><strong>Baseline System: </strong>A baseline system for the FUSS dataset is available at <a href="https://github.com/google-research/sound-separation/tree/master/datasets/fuss">dcase2020_fuss_baseline</a>.</p>
<p><strong>License: </strong>All audio clips (i.e., in FUSS_fsd_data.tar.gz) used in the preparation of Free Universal Source Separation (FUSS) dataset are designated Creative Commons (CC0) and were obtained from<a href="http://freesound.org"> freesound.org</a>. The source data in FUSS_fsd_data.tar.gz were selected using labels from the<a href="https://annotator.freesound.org/fsd/"> FSD50K corpus</a>, which is licensed as Creative Commons Attribution 4.0 International (CC BY 4.0) License.</p>
<p>The FUSS dataset as a whole, is a curated, reverberated, mixed, and partitioned preparation, and is released under the Creative Commons Attribution 4.0 International (CC BY 4.0) License. This license is specified in the `LICENSE-DATASET` file downloaded with the `FUSS_license_doc.tar.gz` file.</p>
<p><strong>Notes:</strong></p>
<p>Added in v1.2: </p>
<ul>
<li>FUSS_baseline_dry_model.tar.gz: baseline separation model trained on non-reverberated (dry) data. </li>
<li>FUSS_DESED_baseline_dry_2_model.tar.gz:: baseline separation model for the DESED task, trained on a mixture of DESED in-domain data and FUSS data</li>
</ul>
<p>Added in v1.3:</p>
<ul>
<li>FUSS_DESED_baseline_dry_1_model.tar.gz: baseline separation model for the DESED task, trained to separate DESED mixtures from dry FUSS mixtures (DmFm)</li>
<li>FUSS_DESED_baseline_dry_4_model.tar.gz: baseline separation model for the DESED task, trained to separate DESED background, dry FUSS mixture, and 5 DESED foreground sources with PIT (PIT)</li>
<li>FUSS_DESED_baseline_dry_4np_model.tar.gz: baseline separation model for the DESED task, trained to separate DESED background, 10 DESED classes, and dry FUSS mixture without PIT (Classwise)</li>
<li>FUSS_DESED_baseline_dry_6_model.tar.gz: baseline separation model for the DESED task, trained to separate DESED background, 5 DESED foreground sources, 4 dry FUSS sources, with groupwise PIT (GroupPIT)</li>
</ul>
<p>The names in parentheses are the task names from Table 3 of the following paper: <a href="https://arxiv.org/pdf/2007.03932.pdf">Nicolas Turpault, Scott Wisdom, Hakan Erdogan, John R. Hershey, Romain Serizel, Eduardo Fonseca, Prem Seetharaman, and Justin Salamon, "Improving Sound Event Detection in Domestic Environments using Sound Separation", DCASE 2020.</a></p>
Zenodo
2020-03-04
info:eu-repo/semantics/other
3694383
user-dcase
user-sigsep
1.3
1599094765.719493
7890709248
md5:0ee3f1229194b5b7ab2cdf8a12add0b6
https://zenodo.org/records/4012661/files/FUSS_ssdata_reverb.tar.gz
8892152218
md5:d1f35069880185056ab1bde5aa1da1e7
https://zenodo.org/records/4012661/files/FUSS_ssdata.tar.gz
103923588
md5:a83120258e77955e1a609765483c69b2
https://zenodo.org/records/4012661/files/FUSS_DESED_baseline_dry_4_model.tar.gz
107033352
md5:aa0282b99604040e65613d8047ba98e8
https://zenodo.org/records/4012661/files/FUSS_DESED_baseline_dry_4np_model.tar.gz
106126813
md5:7b2252c2f07207edaf37d00739390180
https://zenodo.org/records/4012661/files/FUSS_DESED_baseline_dry_6_model.tar.gz
1928324222
md5:79742c73e0d4164f35207e6e74487098
https://zenodo.org/records/4012661/files/FUSS_fsd_data.tar.gz
4022
md5:7d00e8b03601698413045b7293c04fc2
https://zenodo.org/records/4012661/files/FUSS_license_doc.tar.gz
5769611010
md5:e43c7d6d29332eae4bb193f7a940b194
https://zenodo.org/records/4012661/files/FUSS_rir_data.tar.gz
102324048
md5:aca2b32125659a420bd5c307e2505e43
https://zenodo.org/records/4012661/files/FUSS_baseline_dry_model.tar.gz
102342954
md5:fb803bc4b0253912bfe823d4df508a6b
https://zenodo.org/records/4012661/files/FUSS_baseline_model.tar.gz
99893533
md5:17ca2bb89949a117579f5c34afbe443c
https://zenodo.org/records/4012661/files/FUSS_DESED_baseline_dry_1_model.tar.gz
100993980
md5:00174838e1454a9692eebe21e1bce616
https://zenodo.org/records/4012661/files/FUSS_DESED_baseline_dry_2_model.tar.gz
public
10.5281/zenodo.3694383
isVersionOf
doi