BSD10k (Broad Sound Dataset 10k)
Authors/Creators
Description
The BSD10k dataset (Broad Sound Dataset 10k) is an open collection of human-labeled sounds containing over 10k Freesound audio clips, annotated according to the 23 second-level classes defined in the Broad Sound Taxonomy (BST). BST is currently being used in Freesound for organization, filtering, and post-processing tasks. The dataset was created at the Music Technology Group of Universitat Pompeu Fabra. The current version of the dataset is v1.1.
Dataset characteristics
This document describes the second version of the dataset, BSD10k v1.1, and it includes updates and additional sounds over the original BSD10k (v1.0) dataset. The dataset consists of 10,956 sounds from Freesound, totaling 35.25 hours of single-labeled audio. The sounds are cropped to a maximum length of 30 seconds, resulting in variable durations ranging from 0.01 to 30s. Audio lengths vary due to the heterogeneity of the sound classes and the range of contributions from Freesound users. The original files downloaded from Freesound are converted to a standardized format of uncompressed WAV files with 44.1 kHz sampling rate, 16-bit depth, and mono channel. The dataset’s audio files occupy approximately 11.2 GB when unzipped and can be found in the audio folder.
All sounds have been manually labeled by human annotators and categorized into 23 classes, which are the second-level categories of the Broad Sound Taxonomy (see details below). The annotated data has a non-uniform distribution, consisting of 1716 Music, 2368 Instrument samples, 1343 Speech, 4003 Sound effects, and 1526 Soundscapes unevenly distributed among the second-level classes. For each audio file, the current version of the dataset (BSD10k v1.1) includes the following: the category label assigned during annotation and the annotator's confidence score, descriptive metadata (title, tags, description), and provenance information (ID, uploader, license), all provided in BSD10k_metadata.csv. We also provide precomputed audio and text embeddings to facilitate further analysis and reproducibility, located in the features folder. For more details on the dataset creation and its contents, please refer to our paper "Heterogeneous Sound Classification with the Broad Sound Taxonomy and Dataset", specifically Section 3.1. This version of the dataset (BSD10k v1.1) is first presented in "Hierarchical and Multimodal Learning for Heterogeneous Sound Classification". An overview of the BSD10k dataset is also available on the support site.
Taxonomy
The Broad Sound Taxonomy (BST) organizes sounds into a two-level hierarchical structure with 5 top-level and 23 second-level categories. The top-level categories cover distinct types of sounds: Music, Instrument samples, Speech, Sound effects, and Soundscapes. The taxonomy is designed to classify any type of sound while remaining broad, comprehensive, and easy to use. It can be used for organizing and filtering sounds in heterogeneous sound collections, such as Freesound, as well as in personal sound libraries. More details about the categories can be found in BST_description.csv, and additional information about the taxonomy is provided in the journal paper "A General-Purpose Sound Taxonomy for the Classification of Heterogeneous Sound Collections".
Citation
When using all or part of the BSD10k dataset, please cite our papers:
Dataset creation (original version) (available from [UPF e-repositori] [arXiv] [DCASE2024 proceedings]):
@inproceedings{anastasopoulou2024heterogeneous,
title = {Heterogeneous Sound Classification with the {{Broad Sound Taxonomy}} and {{Dataset}}},
author = {Anastasopoulou, Panagiota and Torrey, Jessica and Serra, Xavier and Font, Frederic},
booktitle = {Proc. {{Workshop}} on {{Detection}} and {{Classification}} of {{Acoustic Scenes}} and {{Events}} ({{DCASE}})},
year = {2024}
}
Updated dataset version (v1.1) (available from [UPF e-repositori] [DCASE2025 proceedings]):
@inproceedings{anastasopoulou2025hierarchical,
title = {Hierarchical and Multimodal Learning for Heterogeneous Sound Classification},
author = {Anastasopoulou, Panagiota and Dal R{\'i}, Francesco Ardan and Serra, Xavier and Font, Frederic},
booktitle = {Proc. {{Workshop}} on {{Detection}} and {{Classification}} of {{Acoustic Scenes}} and {{Events}} ({{DCASE}})},
year = {2025}
}
License
BSD10k is released in its entirety under the CC BY 4.0 license. We note, though, that each audio file is released under its own Creative Commons (CC) license, as defined by the respective uploader in Freesound. Some sounds require attribution to their original authors, while others forbid commercial reuse. If the dataset is used in a commercial setting, the sounds with CC BY-NC licenses should be excluded.
This is the distribution of sounds per license:
- CC0: 3,334
- CC BY: 5,970
- CC BY-NC: 1,240
- CC Sampling+: 412
Links to the license deeds for each sound can be further accessed through BSD10k_metadata.csv.
Data structure
BSD10k can be accessed as follows:
𝚛𝚘𝚘𝚝/
├── 𝚊𝚞𝚍𝚒𝚘/ 𝙰𝚞𝚍𝚒𝚘 𝚏𝚒𝚕𝚎𝚜
├── 𝚖𝚎𝚝𝚊𝚍𝚊𝚝𝚊/ 𝙼𝚎𝚝𝚊𝚍𝚊𝚝𝚊 𝚏𝚒𝚕𝚎𝚜
│ ├── 𝙱𝚂𝙳𝟷0𝚔_𝚖𝚎𝚝𝚊𝚍𝚊𝚝𝚊.𝚌𝚜𝚟 𝙳𝚊𝚝𝚊𝚜𝚎𝚝'𝚜 𝚖𝚎𝚝𝚊𝚍𝚊𝚝𝚊
│ ├── 𝙱𝚂𝚃_𝚍𝚎𝚜𝚌𝚛𝚒𝚙𝚝𝚒𝚘𝚗.𝚌𝚜𝚟 𝚃𝚊𝚡𝚘𝚗𝚘𝚖𝚢 𝚒𝚗𝚏𝚘𝚛𝚖𝚊𝚝𝚒𝚘𝚗
│ └── 𝙱𝚂𝚃_𝚍𝚒𝚊𝚐𝚛𝚊𝚖.𝚙𝚗𝚐 𝚃𝚊𝚡𝚘𝚗𝚘𝚖𝚢 𝚍𝚒𝚊𝚐𝚛𝚊𝚖
├── 𝚏𝚎𝚊𝚝𝚞𝚛𝚎𝚜/ 𝙿𝚛𝚎𝚌𝚘𝚖𝚙𝚞𝚝𝚎𝚍 𝚎𝚖𝚋𝚎𝚍𝚍𝚒𝚗𝚐𝚜
│ ├── 𝚌𝚕𝚊𝚙_𝚊𝚞𝚍𝚒𝚘_𝚎𝚖𝚋𝚎𝚍𝚍𝚒𝚗𝚐𝚜 𝙰𝚞𝚍𝚒𝚘 𝚎𝚖𝚋𝚎𝚍𝚍𝚒𝚗𝚐𝚜
│ └── 𝚌𝚕𝚊𝚙_𝚝𝚎𝚡𝚝_𝚎𝚖𝚋𝚎𝚍𝚍𝚒𝚗𝚐𝚜 𝚃𝚎𝚡𝚝 𝚎𝚖𝚋𝚎𝚍𝚍𝚒𝚗𝚐𝚜
└── 𝚁𝙴𝙰𝙳𝙼𝙴.𝚖𝚍 𝙳𝚘𝚌𝚞𝚖𝚎𝚗𝚝𝚊𝚝𝚒𝚘𝚗 (𝚝𝚑𝚊𝚝 𝚢𝚘𝚞 𝚊𝚛𝚎 𝚗𝚘𝚠 𝚛𝚎𝚊𝚍𝚒𝚗𝚐)
BSD10k_metadata.csv is the main metadata file, containing annotations and additional information for each sound. Each row corresponds to one sound and includes the following fields:
sound_id: Freesound ID used as the unique identifier of the sound. The audio files found in theaudiofolder are named using this ID, with a .wav extension for the audio format.class: Second-level class code of the sound.class_idx: Second-level class index (0-22), ordered according to the taxonomy.class_top: Corresponding top-level class code. It is derived from the full (second-level) class code by taking the part before the hyphen (-).confidence: Annotator's confidence score assigned to each sound during the annotation process. It ranges from 1 (very unconfident) to 5 (very confident).uploader: User who uploaded the sound in Freesound.license: Link to the license of the sound.title: Sound title provided by the uploader.tags: Tags associated with the sound provided by the uploader.description: Description of the sound provided by the uploader.
The mapping of class codes to their corresponding full class names can be found in BST_description.csv, which also includes a description and examples for each class (minor ancillary updates from the initial version). A diagram of the taxonomy (BST_diagram.png) is also included for a quick overview of the categories.
The features folder contains two subfolders with audio and text embeddings, both extracted using the LAION-CLAP model. The text embeddings use all available textual descriptive metadata, including title, tags, and description.
Versioning details
v1.1 – 2025-10-14 (current)
- Sound count: 10,956
- Total hours: 35.25
- Metadata fields: sound_id, class (code, index, top level), confidence, title, tags, description, uploader, license
- Features (precomputed): CLAP audio and text embeddings
- Notes: Added new sounds and corrected human labeling errors; included descriptions as part of descriptive metadata, annotation confidence scores, and precomputed embeddings; minor updates in taxonomy file.
v1.0 – 2024-07-11
- Sound count: 10,309
- Total hours: 32.5
- Metadata fields: sound_id, class (code, index, top level), title, tags, uploader, license
- Notes: Initial version
Acknowledgments
This research is partially funded by the Generalitat de Catalunya (2023FI-100252, Joan Oró program), the IA y Música Cátedra (TSI-100929-2023-1, Cátedras ENIA 2022, SE Digitalización e IA, EU NGEU), and the IMPA project (PID2023-152250OB-I00, MCIU, AEI, co-funded by EU).
Contact
You are welcome to contact Panagiota Anastasopoulou if you have any questions, at panagiota.anastasopoulou@upf.edu.
Files
metadata.zip
Additional details
Related works
- Is supplement to
- Conference paper: 10230/71472 (Handle)
- Is supplemented by
- Workflow: https://github.com/allholy/BSD10k (URL)
Funding
- Ministerio de Ciencia, Innovación y Universidades
- Agencia Estatal de Investigación
- Departament de Recerca i Universitats