Eduardo Fonseca
Xavier Favory
Jordi Pons
Frederic Font
Manoj Plakal
Daniel P. W. Ellis
Xavier Serra
2019-01-29
<p>FSDKaggle2018 is an audio dataset containing 11,073 audio files annotated with 41 labels of the <a href="https://research.google.com/audioset////////ontology/index.html">AudioSet Ontology</a>. FSDKaggle2018 has been used for the <a href="http://dcase.community/challenge2018/task-general-purpose-audio-tagging">DCASE Challenge 2018 Task 2</a>, which was run as a Kaggle competition titled <a href="https://www.kaggle.com/c/freesound-audio-tagging">Freesound General-Purpose Audio Tagging Challenge.</a></p>
<p><strong>Citation</strong></p>
<p>If you use the FSDKaggle2018 dataset or part of it, please cite our <a href="https://arxiv.org/abs/1807.09902"><strong>DCASE 2018 paper</strong></a>:</p>
<blockquote>
<p>Eduardo Fonseca, Manoj Plakal, Frederic Font, Daniel P. W. Ellis, Xavier Favory, Jordi Pons, Xavier Serra. "General-purpose Tagging of Freesound Audio with AudioSet Labels: Task Description, Dataset, and Baseline". <em>Proceedings of the DCASE 2018 Workshop</em> (2018)</p>
</blockquote>
<p>You can also consider citing our <a href="https://repositori.upf.edu/bitstream/handle/10230/33299/fonseca_ismir17_freesound.pdf?sequence=1&isAllowed=y"><strong>ISMIR 2017 paper</strong></a>, which describes how we gathered the manual annotations included in FSDKaggle2018.</p>
<blockquote>
<p>Eduardo Fonseca, Jordi Pons, Xavier Favory, Frederic Font, Dmitry Bogdanov, Andres Ferraro, Sergio Oramas, Alastair Porter, and Xavier Serra, "Freesound Datasets: A Platform for the Creation of Open Audio Datasets", In <em>Proceedings of the 18th International Society for Music Information Retrieval Conference</em>, Suzhou, China, 2017</p>
</blockquote>
<p><strong>Contact</strong></p>
<p>You are welcome to contact Eduardo Fonseca should you have any questions at eduardo.fonseca@upf.edu.</p>
<p><strong>About this dataset</strong></p>
<p>Freesound Dataset Kaggle 2018 (or <strong>FSDKaggle2018</strong> for short) is an audio dataset containing 11,073 audio files annotated with 41 labels of the <a href="https://research.google.com/audioset////////ontology/index.html">AudioSet Ontology</a> [1]. FSDKaggle2018 has been used for the Task 2 of the <em>Detection and Classification of Acoustic Scenes and Events</em> (DCASE) Challenge 2018. Please visit the <a href="http://dcase.community/challenge2018/task-general-purpose-audio-tagging">DCASE2018 Challenge Task 2 website</a> for more information. This Task was hosted on the Kaggle platform as a competition titled <a href="https://www.kaggle.com/c/freesound-audio-tagging">Freesound General-Purpose Audio Tagging Challenge</a>. It was organized by researchers from the <a href="https://www.upf.edu/web/mtg">Music Technology Group</a> of Universitat Pompeu Fabra, and from <a href="https://research.google.com/audioset////////about.html">Google Research’s Machine Perception Team</a>.</p>
<p>The goal of this competition was to build an audio tagging system that can categorize an audio clip as belonging to one of a set of 41 diverse categories drawn from the AudioSet Ontology.</p>
<p>All audio samples in this dataset are gathered from <a href="https://freesound.org">Freesound</a> [2] and are provided here as uncompressed PCM 16 bit, 44.1 kHz, mono audio files. Note that because Freesound content is collaboratively contributed, recording quality and techniques can vary widely.</p>
<p>The ground truth data provided in this dataset has been obtained after a data labeling process which is described below in the <em>Data labeling process</em> section. FSDKaggle2018 clips are unequally distributed in the following <strong>41 categories</strong> of the AudioSet Ontology:</p>
<p>"Acoustic_guitar", "Applause", "Bark", "Bass_drum", "Burping_or_eructation", "Bus", "Cello", "Chime", "Clarinet", "Computer_keyboard", "Cough", "Cowbell", "Double_bass", "Drawer_open_or_close", "Electric_piano", "Fart", "Finger_snapping", "Fireworks", "Flute", "Glockenspiel", "Gong", "Gunshot_or_gunfire", "Harmonica", "Hi-hat", "Keys_jangling", "Knock", "Laughter", "Meow", "Microwave_oven", "Oboe", "Saxophone", "Scissors", "Shatter", "Snare_drum", "Squeak", "Tambourine", "Tearing", "Telephone", "Trumpet", "Violin_or_fiddle", "Writing".</p>
<p>Some other relevant characteristics of FSDKaggle2018:</p>
<ul>
<li>
<p>The dataset is split into a train set and a test set.</p>
</li>
<li>
<p>The <strong>train set</strong> is meant to be for system development and includes <strong>~9.5k samples unequally distributed among 41 categories</strong>. The minimum number of audio samples per category in the train set is 94, and the maximum 300. The duration of the audio samples ranges from 300ms to 30s due to the diversity of the sound categories and the preferences of Freesound users when recording sounds. The total duration of the train set is roughly 18h.</p>
</li>
<li>
<p>Out of the ~9.5k samples from the train set, <strong>~3.7k have manually-verified ground truth annotations</strong> and <strong>~5.8k have non-verified annotations</strong>. The non-verified annotations of the train set have a quality estimate of <strong>at least</strong> 65-70% in each category. Checkout the <em>Data labeling process</em> section below for more information about this aspect.</p>
</li>
<li>
<p>Non-verified annotations in the train set are properly flagged in <code>train.csv</code> so that participants can opt to use this information during the development of their systems.</p>
</li>
<li>
<p>The <strong>test set</strong> is composed of <strong>1.6k samples with manually-verified annotations</strong> and with a similar category distribution than that of the train set. The total duration of the test set is roughly 2h.</p>
</li>
<li>
<p>All audio samples in this dataset have a <strong>single label</strong> (i.e. are only annotated with one label). Checkout the <em>Data labeling process </em>section below for more information about this aspect. A single label should be predicted for each file in the test set.</p>
</li>
</ul>
<p><strong>Data labeling process</strong></p>
<p>The data labeling process started from a manual mapping between Freesound tags and AudioSet Ontology categories (or <em>labels</em>), which was carried out by researchers at the Music Technology Group, Universitat Pompeu Fabra, Barcelona. Using this mapping, a number of Freesound audio samples were <strong>automatically annotated</strong> with labels from the AudioSet Ontology. These annotations can be understood as weak labels since they express the presence of a sound category in an audio sample. </p>
<p>Then, a <strong>data validation process</strong> was carried out in which a number of participants did listen to the annotated sounds and manually assessed the presence/absence of an automatically assigned sound category, according to the AudioSet category description.</p>
<p>Audio samples in FSDKaggle2018 are only annotated with a single ground truth label (see <code>train.csv</code>). A total of <strong>3,710 annotations </strong>included in the train set of FSDKaggle2018 are annotations that have been <strong>manually validated</strong> as present and predominant (some with inter-annotator agreement but not all of them). This means that in most cases there is no additional acoustic material other than the labeled category. In few cases there may be some additional sound events, but these additional events won't belong to any of the 41 categories of FSDKaggle2018.</p>
<p>The rest of the annotations have <strong>not</strong> been manually validated and therefore some of them could be inaccurate. Nonetheless, we have <strong>estimated</strong> that <strong>at least</strong> 65-70% of the non-verified annotations per category <strong>in the train set</strong> are indeed correct. It can happen that some of these non-verified audio samples present several sound sources even though only one label is provided as ground truth. These additional sources are typically out of the set of the 41 categories, but in a few cases they could be within.</p>
<p>More details about the data labeling process can be found in [3].</p>
<p><strong>License</strong></p>
<p>FSDKaggle2018 has licenses at two different levels, as explained next.</p>
<p>All sounds in Freesound are released under Creative Commons (CC) licenses, and each audio clip has its own license as defined by the audio clip uploader in Freesound. For attribution purposes and to facilitate attribution of these files to third parties, we include a relation of the audio clips included in FSDKaggle2018 and their corresponding license. The licenses are specified in the files <code>train_post_competition.csv</code> and <code>test_post_competition_scoring_clips.csv</code>.</p>
<p>In addition, FSDKaggle2018 as a whole is the result of a curation process and it has an additional license. FSDKaggle2018 is released under <a href="https://creativecommons.org/licenses/by/4.0/">CC-BY</a>. This license is specified in the <code>LICENSE-DATASET</code> file downloaded with the <code>FSDKaggle2018.doc</code> zip file.</p>
<p><strong>Files</strong></p>
<p>FSDKaggle2018 can be downloaded as a series of zip files with the following directory structure:</p>
<pre>root
│
└───FSDKaggle2018.audio_train/ Audio clips in the train set
│
└───FSDKaggle2018.audio_test/ Audio clips in the test set
│
└───FSDKaggle2018.meta/ Files for evaluation setup
│ │
│ └───train_post_competition.csv Data split and ground truth for the train set
│ │
│ └───test_post_competition_scoring_clips.csv Ground truth for the test set
│
└───FSDKaggle2018.doc/
│
└───README.md The dataset description file you are reading
│
└───LICENSE-DATASET License of FSDKaggle2018 dataset as a whole
</pre>
<p><strong>NOTE</strong>: the original <code>train.csv</code> file provided during the competition has been updated with more metadata (licenses, Freesound ids, etc.) into <code>train_post_competition.csv</code>. Likewise, the original <code>test.csv</code> that was not public during the competition is now available with ground truth and metadata as <code>test_post_competition_scoring_clips.csv</code>. The file name <code>test_post_competition_scoring_clips.csv</code> refers to the fact that only the 1600 clips used for systems' ranking are included. During the competition, an additional subset of <em>padding</em> clips was added in order to prevent undesired practices. This <em>padding</em> subset (that was never used for systems' ranking) is no longer included in the dataset (see our DCASE 2018 paper for more details.)</p>
<p>Each row (i.e. audio clip) of the <code>train_post_competition.csv</code> file contains the following information:</p>
<ul>
<li><code>fname</code>: the file name</li>
<li><code>label</code>: the audio classification label (ground truth)</li>
<li><code>manually_verified</code>: Boolean (1 or 0) flag to indicate whether or not that annotation has been manually verified; see description above for more info</li>
<li><code>freesound_id</code>: the Freesound id for the audio clip</li>
<li><code>license</code>: the license for the audio clip</li>
</ul>
<p>Each row (i.e. audio clip) of the <code>test_post_competition_scoring_clips.csv</code> file contains the following information:</p>
<ul>
<li><code>fname</code>: the file name</li>
<li><code>label</code>: the audio classification label (ground truth)</li>
<li><code>usage</code>: string that indicates to which Kaggle leaderboard the clip was associated during the competition: <code>Public</code> or <code>Private</code></li>
<li><code>freesound_id</code>: the Freesound id for the audio clip</li>
<li><code>license</code>: the license for the audio clip</li>
</ul>
<p><strong>Baseline System</strong></p>
<p>A CNN baseline system for FSDKaggle2018 is available at <a href="https://github.com/DCASE-REPO/dcase2018_baseline/tree/master/task2">https://github.com/DCASE-REPO/dcase2018_baseline/tree/master/task2</a>.</p>
<p><strong>References and links</strong></p>
<p>[1] Jort F Gemmeke, Daniel PW Ellis, Dylan Freedman, Aren Jansen, Wade Lawrence, R Channing Moore, Manoj Plakal, and Marvin Ritter. "Audio set: An ontology and human-labeled dartaset for audio events." Proceedings of the Acoustics, Speech and Signal Processing International Conference, 2017.</p>
<p>[2] Frederic Font, Gerard Roma, and Xavier Serra. "Freesound technical demo." Proceedings of the 21st ACM international conference on Multimedia, 2013. <a href="https://freesound.org">https://freesound.org</a></p>
<p>[3] Eduardo Fonseca, Jordi Pons, Xavier Favory, Frederic Font, Dmitry Bogdanov, Andres Ferraro, Sergio Oramas, Alastair Porter, and Xavier Serra. "Freesound Datasets: A Platform for the Creation of Open Audio Datasets." Proceedings of the International Conference on Music Information Retrieval, 2017. <a href="https://ismir2017.smcnus.org/wp-content/uploads/2017/10/161_Paper.pdf">PDF here</a></p>
<p>Freesound Annotator: <a href="https://annotator.freesound.org/">https://annotator.freesound.org/</a><br>
Freesound: <a href="https://freesound.org">https://freesound.org</a><br>
Eduardo Fonseca's personal website: <a href="http://www.eduardofonseca.net/">http://www.eduardofonseca.net/</a><br>
More datasets collected by us: <a href="http://www.eduardofonseca.net/datasets/">http://www.eduardofonseca.net/datasets/</a></p>
<p><strong>Acknowledgments</strong></p>
<p>This work is partially supported by the European Union’s Horizon 2020 research and innovation programme under grant agreement No 688382 <a href="https://www.audiocommons.org/">AudioCommons</a>. Eduardo Fonseca is also sponsored by a <a href="https://ai.googleblog.com/2018/03/google-faculty-research-awards-2017.html">Google Faculty Research Award 2017</a>. We thank everyone who contributed to FSDKaggle2018 with annotations.</p>
https://doi.org/10.5281/zenodo.2552860
oai:zenodo.org:2552860
Zenodo
https://arxiv.org/abs/1807.09902
https://zenodo.org/communities/mtgupf
https://zenodo.org/communities/freesound-datasets
https://zenodo.org/communities/eu
https://zenodo.org/communities/mdm-dtic-upf
https://doi.org/10.5281/zenodo.2552859
info:eu-repo/semantics/openAccess
Other (Attribution)
audio dataset
audio tagging
Kaggle
DCASE
everyday sounds
FSDKaggle2018
info:eu-repo/semantics/other