Published March 23, 2021 | Version v1
Dataset Open


  • 1. University of Glasgow


GISE-51 is an open dataset of 51 isolated sound events based on the FSD50K dataset. The release also includes the GISE-51-Mixtures subset, a dataset of 5-second soundscapes with up to three sound events synthesized from GISE-51. The GISE-51 release attempts to address some of the shortcomings of recent sound event datasets, providing an open, reproducible benchmark for future research and the freedom to adapt the included isolated sound events for domain-specific applications, which was not possible using existing large-scale weakly labelled datasets. GISE-51 release also included accompanying code for baseline experiments, which can be found at


If you use the GISE-51 dataset and/or the released code, please cite our paper:

Sarthak Yadav and Mary Ellen Foster, "GISE-51: A scalable isolated sound events dataset", arXiv:2103.12306, 2021

Since GISE-51 is based on FSD50K, if you use GISE-51 kindly also cite the FSD50K paper:

Eduardo Fonseca, Xavier Favory, Jordi Pons, Frederic Font, Xavier Serra. "FSD50K: an Open Dataset of Human-Labeled Sound Events", arXiv:2010.00475, 2020.

About GISE-51 and GISE-51-Mixtures

The following sections summarize key characteristics of the GISE-51 and the GISE-51-Mixtures datasets, including details left out from the paper.


  • Three subsets: train, val and eval with 12465, 1716, and2176 utterances. Subsets are in coherence with the FSD50K release.
  • Encompasses 51 sound classes from the FSD50K release
  • View meta/lbl_map.csv for the complete vocabulary.
  • The dataset was obtained from FSD50K using the following steps:
    • Unsmearing annotations to obtain single instances with a single label using the provided metadata and ground truth in FSD50K. 
    • Manual inspection to qualitatively evaluate shortlisted utterances. 
    • Volume-threshold based automated silence filtering using sox. Different volume thresholds are selected for various sound event class bins using trial-and-error. silence_thresholds.txt lists class bins and their corresponding volume threshold. Files that were determined by sox to contain no audio at all were manually clipped. Code for performing silence filtering can be found in scripts/ in the code repository.
    • Re-evaluate sound event classes, removing ones with too few samples and merging those with high inter-class ambiguity.


  • Synthetic 5-second soundscapes with up to 3 events created using Scaper.
  • Weighted sampling with replacement for sound event selection, effectively oversampling events with very few samples. Synthetic soundscapes generated thus have a near equal number of annotations per sound event.
  • The number of soundscapes in val and eval set is 10000 each.
  • The number of soundscapes in the final train set is 60000. We do provide training sets with 5k-100k soundscapes.
  • GISE-51-Mixtures is our proposed subset that can be used to benchmark the performance of future works.


All audio clips (i.e., found in isolated_events.tar.gz) used in the preparation of the Glasgow Isolated Events Dataset (GISE-51) are designated Creative Commons and were obtained from FSD50K. The source data in isolated_events.tar.gz is based on the FSD50K dataset, which is licensed as Creative Commons Attribution 4.0 International (CC BY 4.0) License. 

GISE-51 dataset (including GISE-51-Mixtures) is a curated, processed and generated preparation, and is released under Creative Commons Attribution 4.0 International (CC BY 4.0) License. The license is specified in the LICENSE-DATASET file in license.tar.gz.


Several sound event recognition experiments were conducted, establishing baseline performance on several prominent convolutional neural network architectures. The experiments are described in Section 4 of our paper, and the implementation for reproducing these experiments is available at


GISE-51 is available as a collection of several tar archives. All audio files are PCM 16 bit, 22050 Hz. Following lists the contents of these files in detail:

  • isolated_events.tar.gz: The core GISE-51 isolated events dataset containing train, val and eval subfolders.
  • meta.tar.gz: contains lbl_map.json
  • noises.tar.gz: contains background noises used for GISE-51-Mixtures soundscape generation
  • mixtures_jams.tar.gz: This file contains annotation files in .jams format that, alongside isolated_events.tar.gz and noises.tar.gz can be reused to generate exact GISE-51-Mixtures soundscapes. (Optional, we provide the complete set of GISE-51-Mixtures soundscapes as independent tar archives.)
  • train.tar.gz: GISE-51-Mixtures train set, containing 60k synthetic soundscapes.
  • val.tar.gz: GISE-51-Mixtures val set, containing 10k synthetic soundscapes.
  • eval.tar.gz: GISE-51-Mixtures eval set, containing 10k synthetic soundscapes.
  • train_*.tar.gz: These are tar archives containing training mixtures of a various number of soundscapes, used primarily in Section 4.1 of the paper, which compares val mAP performance v/s number of training soundscapes. A helper script is provided in the code release,, to prepare data for experiments in Section 4.1.
  • pretrained-models.tar.gz: Contains model checkpoints for all experiments conducted in the paper. More information on these checkpoints can be found in the code release README.
    • experiments_60k_mixtures: model checkpoints from section 4.2 of the paper.
    • exported_weights_60k: ResNet-18 and EfficientNet-B1 exported as plain state_dicts for use with transfer learning experiments.
    • experiments_audioset: checkpoints from AudioSet Balanced (Sec 4.3.1) experiments
    • experiments_vggsound: checkpoints from Section 4.3.2 of the paper
    • experiments_esc50: ESC-50 dataset checkpoints, from Section 4.3.3
  • license.tar.gz: contains dataset license info.
  • silence_thresholds.txt: contains volume thresholds for various sound event bins used for silence filtering.


In case of queries and clarifications, feel free to contact Sarthak at (Adding [GISE-51] to the subject of the email would be appreciated!)



Files (37.7 GB)

Name Size Download all
1.1 GB Download
2.4 GB Download
956 Bytes Download
700 Bytes Download
86.5 MB Download
64.0 MB Download
2.6 GB Download
149 Bytes Preview Download
6.7 GB Download
1.1 GB Download
1.7 GB Download
2.2 GB Download
3.4 GB Download
4.5 GB Download
5.6 GB Download
563.2 MB Download
1.1 GB Download
1.1 GB Download
1.1 GB Download
1.1 GB Download
1.1 GB Download