Dataset Open Access
Eduardo Fonseca;
Xavier Favory;
Jordi Pons;
Frederic Font;
Xavier Serra
FSD50K is an open dataset of human-labeled sound events containing 51,197 Freesound clips unequally distributed in 200 classes drawn from the AudioSet Ontology. FSD50K has been created at the Music Technology Group of Universitat Pompeu Fabra.
Citation
If you use the FSD50K dataset, or part of it, please cite our TASLP paper (available from [arXiv] [TASLP]):
@article{fonseca2022FSD50K, title={{FSD50K}: an open dataset of human-labeled sound events}, author={Fonseca, Eduardo and Favory, Xavier and Pons, Jordi and Font, Frederic and Serra, Xavier}, journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing}, volume={30}, pages={829--852}, year={2022}, publisher={IEEE} }
Paper update: This paper has been published in TASLP at the beginning of 2022. The accepted camera-ready version includes a number of improvements with respect to the initial submission. The main updates include: estimation of the amount of label noise in FSD50K, SNR comparison between FSD50K and AudioSet, improved description of evaluation metrics including equations, clarification of experimental methodology and some results, some content moved to Appendix for readability. The TASLP-accepted camera-ready version is available from arXiv (in particular, it is v2 in arXiv, displayed by default).
Data curators
Eduardo Fonseca, Xavier Favory, Jordi Pons, Mercedes Collado, Ceren Can, Rachit Gupta, Javier Arredondo, Gary Avendano and Sara Fernandez
Contact
You are welcome to contact Eduardo Fonseca should you have any questions, at efonseca@google.com.
ABOUT FSD50K
Freesound Dataset 50k (or FSD50K for short) is an open dataset of human-labeled sound events containing 51,197 Freesound clips unequally distributed in 200 classes drawn from the AudioSet Ontology [1]. FSD50K has been created at the Music Technology Group of Universitat Pompeu Fabra.
What follows is a brief summary of FSD50K's most important characteristics. Please have a look at our paper (especially Section 4) to extend the basic information provided here with relevant details for its usage, as well as discussion, limitations, applications and more.
Basic characteristics:
vocabulary.csv
(see Files section below).Dev set:
Eval set:
Note: All classes in FSD50K are represented in AudioSet, except Crash cymbal
, Human group actions
, Human voice
, Respiratory sounds
, and Domestic sounds, home sounds
.
LICENSE
All audio clips in FSD50K are released under Creative Commons (CC) licenses. Each clip has its own license as defined by the clip uploader in Freesound, some of them requiring attribution to their original authors and some forbidding further commercial reuse. Specifically:
The development set consists of 40,966 clips with the following licenses:
The evaluation set consists of 10,231 clips with the following licenses:
For attribution purposes and to facilitate attribution of these files to third parties, we include a mapping from the audio clips to their corresponding licenses. The licenses are specified in the files dev_clips_info_FSD50K.json
and eval_clips_info_FSD50K.json
.
In addition, FSD50K as a whole is the result of a curation process and it has an additional license: FSD50K is released under CC-BY. This license is specified in the LICENSE-DATASET
file downloaded with the FSD50K.doc
zip file. We note that the choice of one license for the dataset as a whole is not straightforward as it comprises items with different licenses (such as audio clips, annotations, or data split). The choice of a global license in these cases may warrant further investigation (e.g., by someone with a background in copyright law).
Usage of FSD50K for commercial purposes:
If you'd like to use FSD50K for commercial purposes, please contact Eduardo Fonseca and Frederic Font at efonseca@google.com and frederic.font@upf.edu.
Also, if you are interested in using FSD50K for machine learning competitions, please contact Eduardo Fonseca and Frederic Font at efonseca@google.com and frederic.font@upf.edu.
FILES
FSD50K can be downloaded as a series of zip files with the following directory structure:
root │ └───FSD50K.dev_audio/ Audio clips in the dev set │ └───FSD50K.eval_audio/ Audio clips in the eval set │ └───FSD50K.ground_truth/ Files for FSD50K's ground truth │ │ │ └─── dev.csv Ground truth for the dev set │ │ │ └─── eval.csv Ground truth for the eval set │ │ │ └─── vocabulary.csv List of 200 sound classes in FSD50K │ └───FSD50K.metadata/ Files for additional metadata │ │ │ └─── class_info_FSD50K.json Metadata about the sound classes │ │ │ └─── dev_clips_info_FSD50K.json Metadata about the dev clips │ │ │ └─── eval_clips_info_FSD50K.json Metadata about the eval clips │ │ │ └─── pp_pnp_ratings_FSD50K.json PP/PNP ratings │ │ │ └─── collection/ Files for the *sound collection* format │ └───FSD50K.doc/ │ └───README.md The dataset description file that you are reading │ └───LICENSE-DATASET License of the FSD50K dataset as an entity
Each row (i.e. audio clip) of dev.csv
contains the following information:
fname
: the file name without the .wav
extension, e.g., the fname 64760
corresponds to the file 64760.wav
in disk. This number is the Freesound id. We always use Freesound ids as filenames.labels
: the class labels (i.e., the ground truth). Note these class labels are smeared, i.e., the labels have been propagated in the upwards direction to the root of the ontology. More details about the label smearing process can be found in Appendix D of our paper. mids
: the Freebase identifiers corresponding to the class labels, as defined in the AudioSet Ontology specificationsplit
: whether the clip belongs to train or val (see paper for details on the proposed split)Rows in eval.csv
follow the same format, except that there is no split
column.
Note: We use a slightly different format than AudioSet for the naming of class labels in order to avoid potential problems with spaces, commas, etc. Example: we use Accelerating_and_revving_and_vroom
instead of the original Accelerating, revving, vroom
. You can go back to the original AudioSet naming using the information provided in vocabulary.csv
(class label and mid for the 200 classes of FSD50K) and the AudioSet Ontology specification.
Files with additional metadata (FSD50K.metadata/)
To allow a variety of analysis and approaches with FSD50K, we provide the following metadata:
class_info_FSD50K.json
: python dictionary where each entry corresponds to one sound class and contains: FAQs
utilized during the annotation of the class, examples
(representative audio clips), and verification_examples
(audio clips presented to raters during annotation as a quality control mechanism). Audio clips are described by the Freesound id. Note: It may be that some of these examples are not included in the FSD50K release.
dev_clips_info_FSD50K.json
: python dictionary where each entry corresponds to one dev clip and contains: title, description, tags, clip license, and the uploader name. All these metadata are provided by the uploader.
eval_clips_info_FSD50K.json
: same as before, but with eval clips.
pp_pnp_ratings.json
: python dictionary where each entry corresponds to one clip in the dataset and contains the PP/PNP ratings for the labels associated with the clip. More specifically, these ratings are gathered for the labels validated in the validation task (Sec. 3 of paper). This file includes 59,485 labels for the 51,197 clips in FSD50K. Out of these labels:
Ratings' legend: PP=1; PNP=0.5; U=0; NP=-1.
Note: The PP/PNP ratings have been provided in the validation task. Subsequently, a subset of these clips corresponding to the eval set was exhaustively labeled in the refinement task, hence receiving additional labels in many cases. For these eval clips, you might want to check their labels in eval.csv
in order to have more info about their audio content (see Sec. 3 for details).
collection/
: This folder contains metadata for what we call the sound collection format. This format consists of the raw annotations gathered, featuring all generated class labels without any restriction.
We provide the collection format to make available some annotations that do not appear in the FSD50K ground truth release. This typically happens in the case of classes for which we gathered human-provided annotations, but that were discarded in the FSD50K release due to data scarcity (more specifically, they were merged with their parents). In other words, the main purpose of the collection
format is to make available annotations for tiny classes. The format of these files in analogous to that of the files in FSD50K.ground_truth/
. A couple of examples show the differences between collection and ground truth formats:
clip
: labels_in_collection
-- labels_in_ground_truth
51690
: Owl
-- Bird,Wild_Animal,Animal
190579
: Toothbrush,Electric_toothbrush
-- Domestic_sounds_and_home_sounds
In the first example, raters provided the label Owl
. However, due to data scarcity, Owl
labels were merged into their parent Bird
. Then, labels Wild_Animal,Animal
were added via label propagation (smearing). The second example shows one of the most extreme cases, where raters provided the labels Electric_toothbrush,Toothbrush
, which both had few data. Hence, they were merged into Toothbrush's parent, which unfortunately is Domestic_sounds_and_home_sounds
(a rather vague class containing a variety of children sound classes).
Note: Labels in the collection format are not smeared.
Note: While in FSD50K's ground truth the vocabulary encompasses 200 classes (common for dev and eval), since the collection format is composed of raw annotations, the vocabulary here is much larger (over 350 classes), and it is slightly different in dev and eval.
For further questions, please contact efonseca@google.com, or join the freesound-annotator Google Group.
DOWNLOAD
The folders FSD50K.ground_truth/
, FSD50K.metadata/
and FSD50K.doc/
are compressed into one zip file each. However, due to their large size, the folders FSD50K.dev_audio/
and FSD50K.eval_audio/
are split into several files. Specifically, FSD50K.dev_audio/
is split into six files (note the last file is not *.z06
, but *.zip
):
FSD50K.dev_audio.z01
FSD50K.dev_audio.z02
FSD50K.dev_audio.z03
FSD50K.dev_audio.z04
FSD50K.dev_audio.z05
FSD50K.dev_audio.zip
In this case, you first have to download all the files. Once downloaded, we merge all the files into one zip file called e.g. unsplit.zip
in your local machine.
zip -s 0 FSD50K.dev_audio.zip --out unsplit.zip
Finally, this merged file is unzipped.
unzip unsplit.zip
Similar guidelines must be followed for the FSD50K.eval_audio/
folder (only two zip files in this case).
BASELINE SYSTEM
Several baseline systems for FSD50K are available at https://github.com/edufonseca/FSD50K_baseline. The experiments are described in Sec 5 of our paper.
REFERENCES AND LINKS
[1] Jort F Gemmeke, Daniel PW Ellis, Dylan Freedman, Aren Jansen, Wade Lawrence, R Channing Moore, Manoj Plakal, and Marvin Ritter. "Audio set: An ontology and human-labeled dataset for audio events." In Proceedings of the International Conference on Acoustics, Speech and Signal Processing, 2017. [PDF]
[2] Eduardo Fonseca, Jordi Pons, Xavier Favory, Frederic Font, Dmitry Bogdanov, Andres Ferraro, Sergio Oramas, Alastair Porter, and Xavier Serra. "Freesound Datasets: A Platform for the Creation of Open Audio Datasets." In Proceedings of the International Conference on Music Information Retrieval, 2017. [PDF]
Companion site for FSD50K: https://annotator.freesound.org/fsd/release/FSD50K/
Freesound Annotator: https://annotator.freesound.org/
Freesound: https://freesound.org
Eduardo Fonseca's personal website: http://www.eduardofonseca.net/
More datasets collected by us: http://www.eduardofonseca.net/datasets/
ACKNOWLEDGMENTS
The authors would like to thank everyone who contributed to FSD50K with annotations, and especially Mercedes Collado, Ceren Can, Rachit Gupta, Javier Arredondo, Gary Avendano and Sara Fernandez for their commitment and perseverance. The authors would also like to thank Daniel P.W. Ellis and Manoj Plakal from Google Research for valuable discussions. This work is partially supported by the European Union’s Horizon 2020 research and innovation programme under grant agreement No 688382 AudioCommons, and two Google Faculty Research Awards 2017 and 2018, and the Maria de Maeztu Units of Excellence Programme (MDM-2015-0502).
Name | Size | |
---|---|---|
FSD50K.dev_audio.z01
md5:faa7cf4cc076fc34a44a479a5ed862a3 |
3.2 GB | Download |
FSD50K.dev_audio.z02
md5:8f9b66153e68571164fb1315d00bc7bc |
3.2 GB | Download |
FSD50K.dev_audio.z03
md5:1196ef47d267a993d30fa98af54b7159 |
3.2 GB | Download |
FSD50K.dev_audio.z04
md5:d088ac4e11ba53daf9f7574c11cccac9 |
3.2 GB | Download |
FSD50K.dev_audio.z05
md5:81356521aa159accd3c35de22da28c7f |
3.2 GB | Download |
FSD50K.dev_audio.zip
md5:c480d119b8f7a7e32fdb58f3ea4d6c5a |
2.3 GB | Download |
FSD50K.doc.zip
md5:3516162b82dc2945d3e7feba0904e800 |
7.0 kB | Download |
FSD50K.eval_audio.z01
md5:3090670eaeecc013ca1ff84fe4442aeb |
3.2 GB | Download |
FSD50K.eval_audio.zip
md5:6fa47636c3a3ad5c7dfeba99f2637982 |
3.0 GB | Download |
FSD50K.ground_truth.zip
md5:ca27382c195e37d2269c4c866dd73485 |
334.7 kB | Download |
FSD50K.metadata.zip
md5:b9ea0c829a411c1d42adb9da539ed237 |
6.7 MB | Download |
All versions | This version | |
---|---|---|
Views | 24,374 | 24,372 |
Downloads | 52,821 | 52,821 |
Data volume | 136.4 TB | 136.4 TB |
Unique views | 20,734 | 20,732 |
Unique downloads | 13,171 | 13,171 |