Dataset Open Access

GISE-51

Yadav, Sarthak; Foster, Mary Ellen

GISE-51 is an open dataset of 51 isolated sound events based on the FSD50K dataset. The release also includes the GISE-51-Mixtures subset, a dataset of 5-second soundscapes with up to three sound events synthesized from GISE-51. The GISE-51 release attempts to address some of the shortcomings of recent sound event datasets, providing an open, reproducible benchmark for future research and the freedom to adapt the included isolated sound events for domain-specific applications, which was not possible using existing large-scale weakly labelled datasets. GISE-51 release also included accompanying code for baseline experiments, which can be found at https://github.com/SarthakYadav/GISE-51-pytorch.

Citation

If you use the GISE-51 dataset and/or the released code, please cite our paper:

Sarthak Yadav and Mary Ellen Foster, "GISE-51: A scalable isolated sound events dataset", arXiv:2103.12306, 2021

Since GISE-51 is based on FSD50K, if you use GISE-51 kindly also cite the FSD50K paper:

Eduardo Fonseca, Xavier Favory, Jordi Pons, Frederic Font, Xavier Serra. "FSD50K: an Open Dataset of Human-Labeled Sound Events", arXiv:2010.00475, 2020.

About GISE-51 and GISE-51-Mixtures

The following sections summarize key characteristics of the GISE-51 and the GISE-51-Mixtures datasets, including details left out from the paper.

GISE-51

  • Three subsets: train, val and eval with 12465, 1716, and2176 utterances. Subsets are in coherence with the FSD50K release.
  • Encompasses 51 sound classes from the FSD50K release
  • View meta/lbl_map.csv for the complete vocabulary.
  • The dataset was obtained from FSD50K using the following steps:
    • Unsmearing annotations to obtain single instances with a single label using the provided metadata and ground truth in FSD50K. 
    • Manual inspection to qualitatively evaluate shortlisted utterances. 
    • Volume-threshold based automated silence filtering using sox. Different volume thresholds are selected for various sound event class bins using trial-and-error. silence_thresholds.txt lists class bins and their corresponding volume threshold. Files that were determined by sox to contain no audio at all were manually clipped. Code for performing silence filtering can be found in scripts/strip_silence_sox.py in the code repository.
    • Re-evaluate sound event classes, removing ones with too few samples and merging those with high inter-class ambiguity.

GISE-51-Mixtures

  • Synthetic 5-second soundscapes with up to 3 events created using Scaper.
  • Weighted sampling with replacement for sound event selection, effectively oversampling events with very few samples. Synthetic soundscapes generated thus have a near equal number of annotations per sound event.
  • The number of soundscapes in val and eval set is 10000 each.
  • The number of soundscapes in the final train set is 60000. We do provide training sets with 5k-100k soundscapes.
  • GISE-51-Mixtures is our proposed subset that can be used to benchmark the performance of future works.

LICENSE

All audio clips (i.e., found in isolated_events.tar.gz) used in the preparation of the Glasgow Isolated Events Dataset (GISE-51) are designated Creative Commons and were obtained from FSD50K. The source data in isolated_events.tar.gz is based on the FSD50K dataset, which is licensed as Creative Commons Attribution 4.0 International (CC BY 4.0) License. 

GISE-51 dataset (including GISE-51-Mixtures) is a curated, processed and generated preparation, and is released under Creative Commons Attribution 4.0 International (CC BY 4.0) License. The license is specified in the LICENSE-DATASET file in license.tar.gz.

Baselines

Several sound event recognition experiments were conducted, establishing baseline performance on several prominent convolutional neural network architectures. The experiments are described in Section 4 of our paper, and the implementation for reproducing these experiments is available at https://github.com/SarthakYadav/GISE-51-pytorch.

Files

GISE-51 is available as a collection of several tar archives. All audio files are PCM 16 bit, 22050 Hz. Following lists the contents of these files in detail:

  • isolated_events.tar.gz: The core GISE-51 isolated events dataset containing train, val and eval subfolders.
  • meta.tar.gz: contains lbl_map.json
  • noises.tar.gz: contains background noises used for GISE-51-Mixtures soundscape generation
  • mixtures_jams.tar.gz: This file contains annotation files in .jams format that, alongside isolated_events.tar.gz and noises.tar.gz can be reused to generate exact GISE-51-Mixtures soundscapes. (Optional, we provide the complete set of GISE-51-Mixtures soundscapes as independent tar archives.)
  • train.tar.gz: GISE-51-Mixtures train set, containing 60k synthetic soundscapes.
  • val.tar.gz: GISE-51-Mixtures val set, containing 10k synthetic soundscapes.
  • eval.tar.gz: GISE-51-Mixtures eval set, containing 10k synthetic soundscapes.
  • train_*.tar.gz: These are tar archives containing training mixtures of a various number of soundscapes, used primarily in Section 4.1 of the paper, which compares val mAP performance v/s number of training soundscapes. A helper script is provided in the code release, prepare_mixtures_lmdb.sh, to prepare data for experiments in Section 4.1.
  • pretrained-models.tar.gz: Contains model checkpoints for all experiments conducted in the paper. More information on these checkpoints can be found in the code release README.
    • experiments_60k_mixtures: model checkpoints from section 4.2 of the paper.
    • exported_weights_60k: ResNet-18 and EfficientNet-B1 exported as plain state_dicts for use with transfer learning experiments.
    • experiments_audioset: checkpoints from AudioSet Balanced (Sec 4.3.1) experiments
    • experiments_vggsound: checkpoints from Section 4.3.2 of the paper
    • experiments_esc50: ESC-50 dataset checkpoints, from Section 4.3.3
  • license.tar.gz: contains dataset license info.
  • silence_thresholds.txt: contains volume thresholds for various sound event bins used for silence filtering.

Contact

In case of queries and clarifications, feel free to contact Sarthak at s.yadav.2@research.gla.ac.uk. (Adding [GISE-51] to the subject of the email would be appreciated!)

Files (37.7 GB)
Name Size
eval.tar.gz
md5:dfb0a482b9a6037239afa17a6b3ff659
1.1 GB Download
isolated_events.tar.gz
md5:c68db0b56d5cf036ee6811eece327a09
2.4 GB Download
license.tar.gz
md5:945b211c67babb7eae5c4c0e7cb18794
956 Bytes Download
meta.tar.gz
md5:404fbe07e9c429139a9ac2b55ee8931d
700 Bytes Download
mixtures_jams.tar.gz
md5:137e24af27e2950046fc545a0008e40c
86.5 MB Download
noises.tar.gz
md5:326eb670b9ebb94dd9e305375b04692b
64.0 MB Download
pretrained-models.tar.gz
md5:c85648980f4c51b8713d04413988f20d
2.6 GB Download
silence_thresholds.txt
md5:6824cb74b7973036e3c991334b6c047b
149 Bytes Download
train.tar.gz
md5:04fbd1ecafb11db18ed08a45ee2589b9
6.7 GB Download
train_10k.tar.gz
md5:b9b22aeb97a3cf5770b38ec875965b96
1.1 GB Download
train_15k.tar.gz
md5:79eea3db1e7383feaeb563b79ca0bc7c
1.7 GB Download
train_20k.tar.gz
md5:e81cbe59d78b7bbdae72448d0b924b24
2.2 GB Download
train_30k.tar.gz
md5:c41adab3e3dc7ed6df3a782094bbd70c
3.4 GB Download
train_40k.tar.gz
md5:79a7913782454be764ef6247eb78043a
4.5 GB Download
train_50k.tar.gz
md5:fbeee4bbf155ad1f7a9ac6bfef6ae564
5.6 GB Download
train_5k.tar.gz
md5:a9f22d5f5abe64211af154e6f8c8cd21
563.2 MB Download
train_p2.tar.gz
md5:5abb0869bc91146970b0f3409c9b66af
1.1 GB Download
train_p3.tar.gz
md5:58ae8d088f05aae4727e4fc10230735c
1.1 GB Download
train_p4.tar.gz
md5:c5955b9c995536f572037eea38f57d00
1.1 GB Download
train_p5.tar.gz
md5:ca8e55bf78e87e696dfdf438520369f3
1.1 GB Download
val.tar.gz
md5:c13250768d4e3e149616ccfe548b6f40
1.1 GB Download
223
364
views
downloads
All versions This version
Views 223223
Downloads 364364
Data volume 1.4 TB1.4 TB
Unique views 187187
Unique downloads 134134

Share

Cite as