FSDnoisy18k
Creators
- 1. Music Technology Group, Universitat Pompeu Fabra, Barcelona
- 2. Google, Inc., New York, NY, USA
Description
FSDnoisy18k is an audio dataset collected with the aim of fostering the investigation of label noise in sound event classification. It contains 42.5 hours of audio across 20 sound classes, including a small amount of manually-labeled data and a larger quantity of real-world noisy data.
Data curators
Eduardo Fonseca and Mercedes Collado
Contact
You are welcome to contact Eduardo Fonseca should you have any questions at eduardo.fonseca@upf.edu.
Citation
If you use this dataset or part of it, please cite the following ICASSP 2019 paper:
Eduardo Fonseca, Manoj Plakal, Daniel P. W. Ellis, Frederic Font, Xavier Favory, and Xavier Serra, “Learning Sound Event Classifiers from Web Audio with Noisy Labels”, arXiv preprint arXiv:1901.01189, 2019
You can also consider citing our ISMIR 2017 paper that describes the Freesound Annotator, which was used to gather the manual annotations included in FSDnoisy18k:
Eduardo Fonseca, Jordi Pons, Xavier Favory, Frederic Font, Dmitry Bogdanov, Andres Ferraro, Sergio Oramas, Alastair Porter, and Xavier Serra, “Freesound Datasets: A Platform for the Creation of Open Audio Datasets”, In Proceedings of the 18th International Society for Music Information Retrieval Conference, Suzhou, China, 2017
FSDnoisy18k description
What follows is a summary of the most basic aspects of FSDnoisy18k. For a complete description of FSDnoisy18k, make sure to check:
- the FSDnoisy18k companion site: http://www.eduardofonseca.net/FSDnoisy18k/
- the description provided in Section 2 of our ICASSP 2019 paper
FSDnoisy18k is an audio dataset collected with the aim of fostering the investigation of label noise in sound event classification. It contains 42.5 hours of audio across 20 sound classes, including a small amount of manually-labeled data and a larger quantity of real-world noisy data.
The source of audio content is Freesound—a sound sharing site created an maintained by the Music Technology Group hosting over 400,000 clips uploaded by its community of users, who additionally provide some basic metadata (e.g., tags, and title). The 20 classes of FSDnoisy18k are drawn from the AudioSet Ontology and are selected based on data availability as well as on their suitability to allow the study of label noise. The 20 classes are: "Acoustic guitar", "Bass guitar", "Clapping", "Coin (dropping)", "Crash cymbal", "Dishes, pots, and pans", "Engine", "Fart", "Fire", "Fireworks", "Glass", "Hi-hat", "Piano", "Rain", "Slam", "Squeak", "Tearing", "Walk, footsteps", "Wind", and "Writing". FSDnoisy18k was created with the Freesound Annotator, which is a platform for the collaborative creation of open audio datasets.
We defined a clean portion of the dataset consisting of correct and complete labels. The remaining portion is referred to as the noisy portion. Each clip in the dataset has a single ground truth label (singly-labeled data).
The clean portion of the data consists of audio clips whose labels are rated as present in the clip and predominant (almost all with full inter-annotator agreement), meaning that the label is correct and, in most cases, there is no additional acoustic material other than the labeled class. A few clips may contain some additional sound events, but they occur in the background and do not belong to any of the 20 target classes. This is more common for some classes that rarely occur alone, e.g., “Fire”, “Glass”, “Wind” or “Walk, footsteps”.
The noisy portion of the data consists of audio clips that received no human validation. In this case, they are categorized on the basis of the user-provided tags in Freesound. Hence, the noisy portion features a certain amount of label noise.
Code
We've released the code for our ICASSP 2019 paper at https://github.com/edufonseca/icassp19. The framework comprises all the basic stages: feature extraction, training, inference and evaluation. After loading the FSDnoisy18k dataset, log-mel energies are computed and a CNN baseline is trained and evaluated. The code also allows to test four noise-robust loss functions. Please check our paper for more details.
Label noise characteristics
FSDnoisy18k features real label noise that is representative of audio data retrieved from the web, particularly from Freesound. The analysis of a per-class, random, 15% of the noisy portion of FSDnoisy18k revealed that roughly 40% of the analyzed labels are correct and complete, whereas 60% of the labels show some type of label noise. Please check the FSDnoisy18k companion site for a detailed characterization of the label noise in the dataset, including a taxonomy of label noise for singly-labeled data as well as a per-class description of the label noise.
FSDnoisy18k basic characteristics
The dataset most relevant characteristics are as follows:
- FSDnoisy18k contains 18,532 audio clips (42.5h) unequally distributed in the 20 aforementioned classes drawn from the AudioSet Ontology.
- The audio clips are provided as uncompressed PCM 16 bit, 44.1 kHz, mono audio files.
- The audio clips are of variable length ranging from 300ms to 30s, and each clip has a single ground truth label (singly-labeled data).
- The dataset is split into a test set and a train set. The test set is drawn entirely from the clean portion, while the remainder of data forms the train set.
- The train set is composed of 17,585 clips (41.1h) unequally distributed among the 20 classes. It features a clean subset and a noisy subset. In terms of number of clips their proportion is 10%/90%, whereas in terms of duration the proportion is slightly more extreme (6%/94%). The per-class percentage of clean data within the train set is also imbalanced, ranging from 6.1% to 22.4%. The number of audio clips per class ranges from 51 to 170, and from 250 to 1000 in the clean and noisy subsets, respectively. Further, a noisy small subset is defined, which includes an amount of (noisy) data comparable (in terms of duration) to that of the clean subset.
- The test set is composed of 947 clips (1.4h) that belong to the clean portion of the data. Its class distribution is similar to that of the clean subset of the train set. The number of per-class audio clips in the test set ranges from 30 to 72. The test set enables a multi-class classification problem.
- FSDnoisy18k is an expandable dataset that features a per-class varying degree of types and amount of label noise. The dataset allows investigation of label noise as well as other approaches, from semi-supervised learning, e.g., self-training to learning with minimal supervision.
License
FSDnoisy18k has licenses at two different levels, as explained next. All sounds in Freesound are released under Creative Commons (CC) licenses, and each audio clip has its own license as defined by the audio clip uploader in Freesound. In particular, all Freesound clips included in FSDnoisy18k are released under either CC-BY or CC0. For attribution purposes and to facilitate attribution of these files to third parties, we include a relation of audio clips and their corresponding license in the LICENSE-INDIVIDUAL-CLIPS
file downloaded with the dataset.
In addition, FSDnoisy18k as a whole is the result of a curation process and it has an additional license. FSDnoisy18k is released under CC-BY. This license is specified in the LICENSE-DATASET
file downloaded with the dataset.
Files
FSDnoisy18k can be downloaded as a series of zip files with the following directory structure:
root │ └───FSDnoisy18k.audio_train/ Audio clips in the train set │ └───FSDnoisy18k.audio_test/ Audio clips in the test set │ └───FSDnoisy18k.meta/ Files for evaluation setup │ │ │ └───train.csv Data split and ground truth for the train set │ │ │ └───test.csv Ground truth for the test set │ └───FSDnoisy18k.doc/ │ └───README.md The dataset description file that you are reading │ └───LICENSE-DATASET License of the FSDnoisy18k dataset as an entity │ └───LICENSE-INDIVIDUAL-CLIPS.csv Licenses of the individual audio clips from Freesound
Each row (i.e. audio clip) of the train.csv
file contains the following information:
fname
: the file namelabel
: the audio classification label (ground truth)aso_id
: the id of the corresponding category as per the AudioSet Ontologymanually_verified
: Boolean (1 or 0) flag to indicate whether the clip belongs to the clean portion (1), or to the noisy portion (0) of the train setnoisy_small
: Boolean (1 or 0) flag to indicate whether the clip belongs to the noisy_small portion (1) of the train set
Each row (i.e. audio clip) of the test.csv
file contains the following information:
fname
: the file namelabel
: the audio classification label (ground truth)aso_id
: the id of the corresponding category as per the AudioSet Ontology
Links
Source code for our preprint: https://github.com/edufonseca/icassp19
Freesound Annotator: https://annotator.freesound.org/
Freesound: https://freesound.org
Eduardo Fonseca’s personal website: http://www.eduardofonseca.net/
Acknowledgments
This work is partially supported by the European Union’s Horizon 2020 research and innovation programme under grant agreement No 688382 AudioCommons. Eduardo Fonseca is also sponsored by a Google Faculty Research Award 2017. We thank everyone who contributed to FSDnoisy18k with annotations.
Files
FSDnoisy18k.audio_test.zip
Additional details
Related works
- Is part of
- https://arxiv.org/abs/1901.01189 (URL)