Dataset Open Access

FSDnoisy18k

Eduardo Fonseca; Mercedes Collado; Manoj Plakal; Daniel P. W. Ellis; Frederic Font; Xavier Favory; Xavier Serra


Citation Style Language JSON Export

{
  "publisher": "Zenodo", 
  "DOI": "10.5281/zenodo.2529934", 
  "title": "FSDnoisy18k", 
  "issued": {
    "date-parts": [
      [
        2019, 
        1, 
        3
      ]
    ]
  }, 
  "abstract": "<p>FSDnoisy18k is an audio dataset collected with the aim of fostering the investigation of label noise in sound event classification. It contains 42.5 hours of audio across 20 sound classes, including a small amount of manually-labeled data and a larger quantity of real-world noisy data.</p>\n\n<p><strong>Data curators</strong></p>\n\n<p>Eduardo Fonseca and Mercedes Collado</p>\n\n<p><strong>Contact</strong></p>\n\n<p>You are welcome to contact Eduardo Fonseca should you have any questions at eduardo.fonseca@upf.edu.</p>\n\n<p><strong>Citation</strong></p>\n\n<p>If you use this dataset or part of it, please cite the following <strong><a href=\"https://arxiv.org/abs/1901.01189\">ICASSP 2019 paper</a></strong>:</p>\n\n<blockquote>\n<p>Eduardo Fonseca, Manoj Plakal, Daniel P. W. Ellis, Frederic Font, Xavier Favory, and Xavier Serra, &ldquo;Learning Sound Event Classifiers from Web Audio with Noisy Labels&rdquo;, arXiv preprint arXiv:1901.01189, 2019</p>\n</blockquote>\n\n<p>You can also consider citing our <strong><a href=\"https://repositori.upf.edu/bitstream/handle/10230/33299/fonseca_ismir17_freesound.pdf?sequence=1&amp;isAllowed=y\">ISMIR 2017 paper</a></strong> that describes the <a href=\"https://annotator.freesound.org/\">Freesound Annotator</a>, which was used to gather the manual annotations included in FSDnoisy18k:</p>\n\n<blockquote>\n<p>Eduardo Fonseca, Jordi Pons, Xavier Favory, Frederic Font, Dmitry Bogdanov, Andres Ferraro, Sergio Oramas, Alastair Porter, and Xavier Serra, &ldquo;Freesound Datasets: A Platform for the Creation of Open Audio Datasets&rdquo;, In Proceedings of the 18th International Society for Music Information Retrieval Conference, Suzhou, China, 2017</p>\n</blockquote>\n\n<p><strong>FSDnoisy18k description</strong></p>\n\n<p>What follows&nbsp;is a summary of the <strong>most basic aspects</strong> of FSDnoisy18k. For a complete description of FSDnoisy18k, make sure to check:</p>\n\n<ul>\n\t<li>the <strong>FSDnoisy18k companion site</strong>: <a href=\"http://www.eduardofonseca.net/FSDnoisy18k/\">http://www.eduardofonseca.net/FSDnoisy18k/</a></li>\n\t<li>the description provided in <strong>Section 2 of our ICASSP 2019 paper</strong></li>\n</ul>\n\n<p>FSDnoisy18k is an audio dataset collected with the aim of fostering the investigation of label noise in sound event classification. It contains 42.5 hours of audio across 20 sound classes, including a small amount of manually-labeled data and a larger quantity of real-world noisy data.</p>\n\n<p>The source of audio content is <a href=\"https://freesound.org\">Freesound</a>&mdash;a sound sharing site created an maintained by the <a href=\"https://www.upf.edu/web/mtg\">Music Technology Group</a> hosting over 400,000 clips uploaded by its community of users, who additionally provide some basic metadata (e.g., tags, and title). The 20 classes of FSDnoisy18k are drawn from the <a href=\"https://research.google.com/audioset////////ontology/index.html\">AudioSet Ontology</a> and are selected based on data availability as well as on their suitability to allow the study of label noise. The 20 classes are: &quot;Acoustic guitar&quot;, &quot;Bass guitar&quot;, &quot;Clapping&quot;, &quot;Coin (dropping)&quot;, &quot;Crash cymbal&quot;, &quot;Dishes, pots, and pans&quot;, &quot;Engine&quot;, &quot;Fart&quot;, &quot;Fire&quot;, &quot;Fireworks&quot;, &quot;Glass&quot;, &quot;Hi-hat&quot;, &quot;Piano&quot;, &quot;Rain&quot;, &quot;Slam&quot;, &quot;Squeak&quot;, &quot;Tearing&quot;, &quot;Walk, footsteps&quot;, &quot;Wind&quot;, and &quot;Writing&quot;. FSDnoisy18k was created with the <a href=\"https://annotator.freesound.org/\">Freesound Annotator</a>, which is a platform for the collaborative creation of open audio datasets.</p>\n\n<p>We defined a <em>clean</em> portion of the dataset consisting of correct and complete labels. The remaining portion is referred to as the <em>noisy</em> portion. Each clip in the dataset has a single ground truth label (singly-labeled data).</p>\n\n<p>The <strong>clean portion</strong> of the data consists of audio clips whose labels are rated as present in the clip and predominant (almost all with full inter-annotator agreement), meaning that the label is correct and, in most cases, there is no additional acoustic material other than the labeled class. A few clips may contain some additional sound events, but they occur in the background and do not belong to any of the 20 target classes. This is more common for some classes that rarely occur alone, e.g., &ldquo;Fire&rdquo;, &ldquo;Glass&rdquo;, &ldquo;Wind&rdquo; or &ldquo;Walk, footsteps&rdquo;.</p>\n\n<p>The <strong>noisy portion</strong> of the data consists of audio clips that received no human validation. In this case, they are categorized on the basis of the user-provided tags in Freesound. Hence, the noisy portion features a certain amount of label noise.</p>\n\n<p><strong>Code</strong></p>\n\n<p>We&#39;ve released the code for our ICASSP 2019 paper at <a href=\"https://github.com/edufonseca/icassp19\">https://github.com/edufonseca/icassp19</a>. The framework comprises all the basic stages: feature extraction, training, inference and evaluation. After loading the FSDnoisy18k dataset, log-mel energies are computed and a CNN baseline is trained and evaluated. The code also allows to test four noise-robust loss functions. Please check our paper for more details.</p>\n\n<p><strong>Label noise characteristics</strong></p>\n\n<p>FSDnoisy18k features real label noise that is representative of audio data retrieved from the web, particularly from Freesound. The analysis of a <strong>per-class, random, 15% of the noisy portion</strong> of FSDnoisy18k revealed that roughly 40% of the analyzed labels are correct and complete, whereas 60% of the labels show some type of label noise. Please check the <a href=\"http://www.eduardofonseca.net/FSDnoisy18k/\">FSDnoisy18k companion site</a> for a detailed characterization of the label noise in the dataset, including a taxonomy of label noise for singly-labeled data as well as a per-class description of the label noise.</p>\n\n<p><strong>FSDnoisy18k basic characteristics</strong></p>\n\n<p>The dataset most relevant characteristics are as follows:</p>\n\n<ul>\n\t<li>FSDnoisy18k contains 18,532 audio clips (42.5h) unequally distributed in the 20 aforementioned classes drawn from the AudioSet Ontology.</li>\n\t<li>The audio clips are provided as uncompressed PCM 16 bit, 44.1 kHz, mono audio files.</li>\n\t<li>The audio clips are of variable length ranging from 300ms to 30s, and each clip has a single ground truth label (singly-labeled data).</li>\n\t<li>The dataset is split into a <strong>test set</strong> and a <strong>train set</strong>. The test set is drawn entirely from the clean portion, while the remainder of data forms the train set.</li>\n\t<li>The <strong>train set</strong> is composed of 17,585 clips (41.1h) unequally distributed among the 20 classes. It features a <strong>clean subset</strong> and a <strong>noisy subset</strong>. In terms of number of clips their proportion is 10%/90%, whereas in terms of duration the proportion is slightly more extreme (6%/94%). The per-class percentage of clean data within the train set is also imbalanced, ranging from 6.1% to 22.4%. The number of audio clips per class ranges from 51 to 170, and from 250 to 1000 in the clean and noisy subsets, respectively. Further, a noisy small subset is defined, which includes an amount of (noisy) data comparable (in terms of duration) to that of the clean subset.</li>\n\t<li>The <strong>test set</strong> is composed of 947 clips (1.4h) that belong to the clean portion of the data. Its class distribution is similar to that of the clean subset of the train set. The number of per-class audio clips in the test set ranges from 30 to 72. The test set enables a multi-class classification problem.</li>\n\t<li>FSDnoisy18k is an expandable dataset that features a per-class varying degree of types and amount of label noise. The dataset allows investigation of label noise as well as other approaches, from semi-supervised learning, e.g., self-training to learning with minimal supervision.</li>\n</ul>\n\n<p><strong>License</strong></p>\n\n<p>FSDnoisy18k has licenses at two different levels, as explained next. All sounds in Freesound are released under Creative Commons (CC) licenses, and each audio clip has its own license as defined by the audio clip uploader in Freesound. In particular, all Freesound clips included in FSDnoisy18k are released under either <a href=\"https://creativecommons.org/licenses/by/3.0/\">CC-BY</a> or <a href=\"https://creativecommons.org/publicdomain/zero/1.0/\">CC0</a>. For attribution purposes and to facilitate attribution of these files to third parties, we include a relation of audio clips and their corresponding license in the <code>LICENSE-INDIVIDUAL-CLIPS</code> file downloaded with the dataset.</p>\n\n<p>In addition, FSDnoisy18k as a whole is the result of a curation process and it has an additional license. FSDnoisy18k is released under <a href=\"https://creativecommons.org/licenses/by/4.0/\">CC-BY</a>. This license is specified in the <code>LICENSE-DATASET</code> file downloaded with the dataset.</p>\n\n<p><strong>Files</strong></p>\n\n<p>FSDnoisy18k can be downloaded as a series of zip files with the following directory structure:</p>\n\n<pre>root\n\u2502  \n\u2514\u2500\u2500\u2500FSDnoisy18k.audio_train/          Audio clips in the train set\n\u2502   \n\u2514\u2500\u2500\u2500FSDnoisy18k.audio_test/           Audio clips in the test set\n\u2502   \n\u2514\u2500\u2500\u2500FSDnoisy18k.meta/                 Files for evaluation setup\n\u2502   \u2502            \n\u2502   \u2514\u2500\u2500\u2500train.csv                     Data split and ground truth for the train set\n\u2502   \u2502            \n\u2502   \u2514\u2500\u2500\u2500test.csv                      Ground truth for the test set         \n\u2502   \n\u2514\u2500\u2500\u2500FSDnoisy18k.doc/\n    \u2502            \n    \u2514\u2500\u2500\u2500README.md                     The dataset description file that you are reading\n    \u2502            \n    \u2514\u2500\u2500\u2500LICENSE-DATASET               License of the FSDnoisy18k dataset as an entity   \n    \u2502            \n    \u2514\u2500\u2500\u2500LICENSE-INDIVIDUAL-CLIPS.csv  Licenses of the individual audio clips from Freesound \n</pre>\n\n<p>Each row (i.e. audio clip) of the <code>train.csv</code> file contains the following information:</p>\n\n<ul>\n\t<li><code>fname</code>: the file name</li>\n\t<li><code>label</code>: the audio classification label (ground truth)</li>\n\t<li><code>aso_id</code>: the id of the corresponding category as per the AudioSet Ontology</li>\n\t<li><code>manually_verified</code>: Boolean (1 or 0) flag to indicate whether the clip belongs to the <strong>clean portion (1)</strong>, or to the <strong>noisy portion (0)</strong> of the train set</li>\n\t<li><code>noisy_small</code>: Boolean (1 or 0) flag to indicate whether the clip belongs to the <strong>noisy_small portion (1)</strong> of the train set</li>\n</ul>\n\n<p>Each row (i.e. audio clip) of the <code>test.csv</code> file contains the following information:</p>\n\n<ul>\n\t<li><code>fname</code>: the file name</li>\n\t<li><code>label</code>: the audio classification label (ground truth)</li>\n\t<li><code>aso_id</code>: the id of the corresponding category as per the AudioSet Ontology</li>\n</ul>\n\n<p><strong>Links</strong></p>\n\n<p>Source code for our preprint: <a href=\"https://github.com/edufonseca/icassp19\">https://github.com/edufonseca/icassp19</a><br>\nFreesound Annotator: <a href=\"https://annotator.freesound.org/\">https://annotator.freesound.org/</a><br>\nFreesound: <a href=\"https://freesound.org\">https://freesound.org</a><br>\nEduardo Fonseca&rsquo;s personal website: <a href=\"http://www.eduardofonseca.net/\">http://www.eduardofonseca.net/</a></p>\n\n<p><strong>Acknowledgments</strong></p>\n\n<p>This work is partially supported by the European Union&rsquo;s Horizon 2020 research and innovation programme under grant agreement No 688382 <a href=\"https://www.audiocommons.org/\">AudioCommons</a>. Eduardo Fonseca is also sponsored by a <a href=\"https://ai.googleblog.com/2018/03/google-faculty-research-awards-2017.html\">Google Faculty Research Award 2017</a>. We thank everyone who contributed to FSDnoisy18k with annotations.</p>", 
  "author": [
    {
      "family": "Eduardo Fonseca"
    }, 
    {
      "family": "Mercedes Collado"
    }, 
    {
      "family": "Manoj Plakal"
    }, 
    {
      "family": "Daniel P. W. Ellis"
    }, 
    {
      "family": "Frederic Font"
    }, 
    {
      "family": "Xavier Favory"
    }, 
    {
      "family": "Xavier Serra"
    }
  ], 
  "version": "1.0", 
  "type": "dataset", 
  "id": "2529934"
}
4,275
9,518
views
downloads
All versions This version
Views 4,2754,271
Downloads 9,5189,511
Data volume 54.9 TB54.9 TB
Unique views 3,7513,747
Unique downloads 3,8713,869

Share

Cite as