There is a newer version of the record available.

Published July 11, 2024 | Version 1.0
Dataset Open

BSD10k (Broad Sound Dataset 10k)

  • 1. ROR icon Pompeu Fabra University

Description

The BSD10k dataset (Broad Sound Dataset 10k) is an open collection of human-labeled sounds containing over 10k Freesound audio clips, annotated according to the 23 second-level classes defined in the Broad Sound Taxonomy. The dataset has been created at the Music Technology Group of Universitat Pompeu Fabra.

 

Dataset characteristics

The dataset consists of 10,309 sounds from Freesound, totaling ~32.5 hours of single-labeled audio. The sounds are cropped to a maximum length of 30 seconds, resulting in variable durations ranging from 0.01 to 30s. Audio lengths vary due to the heterogeneity of the sound classes and the range of contributions from Freesound users. The original files downloaded from Freesound are converted to a standardized format of uncompressed WAV files with 44.1 kHz sampling rate, 16-bit depth, and mono channel. The dataset’s audio files occupy approximately 7.9 GB and can be found in the audio folder.

All sounds have been manually labeled by human annotators, with an estimated error rate of ~1%. The dataset categorizes the sounds into 23 classes, which are the second-level categories of the Broad Sound Taxonomy (see details below). The annotated data has a non-uniform distribution across these categories. For each audio file, the dataset includes the assigned category label, descriptive metadata (title, tags), provenance information (ID, uploader), and the license, all provided in BSD10k_metadata.csv. For more details on the dataset creation and its contents, please refer to our paper "Heterogeneous Sound Classification with the Broad Sound Taxonomy and Dataset", specifically Section 3.1. An overview of the BSD10k dataset is also available on the support site.

 

Taxonomy

The Broad Sound Taxonomy (BST) organizes sounds into a two-level hierarchical structure with 5 top-level and 23 second-level categories. The top-level categories cover distinct types of sounds: Music, Instrument samples, Speech, Sound effects, and Soundscapes. The taxonomy is designed to classify any type of sound while remaining broad, comprehensive, and easy to use. It can be used for organizing and filtering sounds in heterogeneous sound collections, such as Freesound, as well as in personal sound libraries. More details about the categories can be found in BST_description.csv, and additional information about the taxonomy is provided in our upcoming journal paper "A General-Purpose Sound Taxonomy for the Classification of Heterogeneous Sound Collections".

 

Citation

When using all or part of the BSD10k dataset, please cite our paper (available from [UPF e-repositori] [arXiv] [DCASE2024 proceedings]):

@inproceedings{anastasopoulou2024heterogeneous,
  title = {Heterogeneous Sound Classification with  the {{Broad Sound Taxonomy}} and {{Dataset}}},
  author = {Anastasopoulou, Panagiota and Torrey, Jessica and Serra, Xavier and Font, Frederic},
  booktitle = {Workshop on {{Detection}} and {{Classification}} of {{Acoustic Scenes}} and {{Events}} ({{DCASE}})},
  year = {2024}
}

 

License

BSD10k is released in its entirety under the CC BY 4.0 license. We note, though, that each audio file is released under its own Creative Commons (CC) license, as defined by the respective uploader in Freesound. Some sounds require attribution to their original authors, while others forbid commercial reuse. If the dataset is used in a commercial setting, the sounds with CC BY-NC licenses should be excluded.

This is the distribution of sounds per license:

  • CC0: 3,187
  • CC BY: 5,534
  • CC BY-NC: 1,192
  • CC Sampling+: 396

Links to the license deeds for each sound can be further accessed through BSD10k_metadata.csv.

 

Data structure

BSD10k can be accessed as follows:

𝚛𝚘𝚘𝚝/
├── 𝚊𝚞𝚍𝚒𝚘/                                                        𝙰𝚞𝚍𝚒𝚘 𝚏𝚒𝚕𝚎𝚜
├── 𝚖𝚎𝚝𝚊𝚍𝚊𝚝𝚊/                                                𝙼𝚎𝚝𝚊𝚍𝚊𝚝𝚊 𝚏𝚒𝚕𝚎𝚜
│   ├── 𝙱𝚂𝙳𝟷0𝚔_𝚖𝚎𝚝𝚊𝚍𝚊𝚝𝚊.𝚌𝚜𝚟                          𝙳𝚊𝚝𝚊𝚜𝚎𝚝'𝚜 𝚖𝚎𝚝𝚊𝚍𝚊𝚝𝚊
│   ├── 𝙱𝚂𝚃_𝚍𝚎𝚜𝚌𝚛𝚒𝚙𝚝𝚒𝚘𝚗.𝚌𝚜𝚟                         𝚃𝚊𝚡𝚘𝚗𝚘𝚖𝚢 𝚒𝚗𝚏𝚘𝚛𝚖𝚊𝚝𝚒𝚘𝚗
│   └── 𝙱𝚂𝚃_𝚍𝚒𝚊𝚐𝚛𝚊𝚖.𝚙𝚗𝚐                                   𝚃𝚊𝚡𝚘𝚗𝚘𝚖𝚢 𝚍𝚒𝚊𝚐𝚛𝚊𝚖
└── 𝚁𝙴𝙰𝙳𝙼𝙴.𝚖𝚍                                               𝙳𝚘𝚌𝚞𝚖𝚎𝚗𝚝𝚊𝚝𝚒𝚘𝚗  (𝚝𝚑𝚊𝚝 𝚢𝚘𝚞 𝚊𝚛𝚎 𝚗𝚘𝚠 𝚛𝚎𝚊𝚍𝚒𝚗𝚐)

BSD10k_metadata.csv is the main metadata file, containing annotations and additional information for each sound. Each row corresponds to one sound and includes the following fields:

  • sound_id: Freesound ID used as the unique identifier of the sound. The audio files found in the audio folder are named using this ID, with a .wav extension for the audio format.
  • class: Second-level class code of the sound.
  • class_idx: Second-level class index (0-22), ordered according to the taxonomy.
  • class_top: Corresponding top-level class code.
  • uploader: User who uploaded the sound in Freesound.
  • license: Link to the license of the sound.
  • title: Sound title provided by the uploader.
  • tags: Tags associated with the sound provided by the uploader.

The mapping of class codes to their corresponding full class names can be found in BST_description.csv, which also includes a description and examples for each class. A diagram of the taxonomy (BST_diagram.png) is also included for a quick overview of the categories.

 

Acknowledgments

This research is partially funded by the Generalitat de Catalunya (2023FI-100252, Joan Oró program) and the IA y Música Cátedra (TSI-100929-2023-1, Cátedras ENIA 2022, SE Digitalización e IA, EU NGEU).

 

Contact

You are welcome to contact Panagiota Anastasopoulou if you have any questions, at panagiota.anastasopoulou@upf.edu.

Files

metadata.zip

Files (7.9 GB)

Name Size Download all
md5:4ddf08edfeb65b1e59b2d07fdea415ea
7.9 GB Preview Download
md5:7569ecaae70e0ec5eb05c1b59196eef3
1.8 MB Preview Download
md5:8371e288b86aab25a33c3f972654a4f9
5.9 kB Preview Download

Additional details

Related works

Is supplement to
Conference paper: 10230/68432 (Handle)
Is supplemented by
Workflow: https://github.com/allholy/BSD10k (URL)