Planned intervention: On Wednesday April 3rd 05:30 UTC Zenodo will be unavailable for up to 2-10 minutes to perform a storage cluster upgrade.

There is a newer version of the record available.

Published October 6, 2017 | Version v1
Dataset Open

URBAN-SED

Description

DESCRIPTION

URBAN-SED is a dataset of 10,000 soundscapes with sound event annotations generated using scaper (github.com/justinsalamon/scaper).

A detailed description of the dataset is provided in the following article:

Scaper: A Library for Soundscape Synthesis and Augmentation
J. Salamon, D. MacConnell, M. Cartwright, P. Li, and J. P. Bello.
In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, NY, USA, Oct. 2017.
(PDF: https://goo.gl/RsfRhP)

A summary is provided here:

  • The dataset includes 10,000 soundscapes, totals almost 30 hours and includes close to 50,000 annotated sound events
  • Complete annotations are provided in JAMS format, and simplified annotations are provided as tab-separated text files
  • Every soundscape is 10 seconds long and has a background of Brownian noise resembling the typical "hum" often heard in urban environments
  • Every soundscape contains between 1-9 sound events from the following classes:
    • air_conditioner, car_horn, children_playing, dog_bark, drilling, engine_idling, gun_shot, jackhammer, siren and street_music
  • The source material for the sound events are the clips from the UrbanSound8K dataset (https://serv.cusp.nyu.edu/projects/urbansounddataset/)
  • URBAN-SED comes pre-sorted into three sets: train, validate and test:
    • There are 6000 soundscapes in the training set, generated using clips from folds 1-6 in UrbanSound8K
    • There are 2000 soundscapes in the validation set,  generated using clips from folds 7-8 in UrbanSound8K
    • There are 2000 soundscapes in the test set, generated using clips from folds 9-10 in UrbanSound8K
  • Further details about how the soundscapes were generated including the distribution of sound event start times, durations, signal-to-noise ratios, pitch shifting, time stretching, and the range of sound event polyphony (overlap) can be found in Section 3 of the scaper paper: https://goo.gl/RsfRhP
  • The scripts used to generated URBAN-SED using scaper can be found here: https://github.com/justinsalamon/scaper_waspaa2017/tree/master/notebooks

AUDIO FILES INCLUDED

* 10,000 synthesized soundscapes in single channel (mono), 44100Hz, 16-bit, WAV format.
* The files are split into a training set (6000), validation set (2000) and test set (2000).

ANNOTATION FILES INCLUDED

The annotations list the sound events that occur in every soundscape. The annotations are "strong", meaning for every 
sound event the annotations include (at least) the start time, end time, and label of the sound event. Sound events 
come from the following 10 labels (categories):
    * air_conditioner, car_horn, children_playing, dog_bark, drilling, engine_idling, gun_shot, jackhammer, 
      siren, street_music

There are two types of annotations: full annotations in JAMS format, and simplified annotations in 
tab-separated txt format.

JAMS Annotations
-------------------------
* The full annotations are distributed in JAMS format (https://github.com/marl/jams).
* There are 10,000 JAMS annotation files, each one corresponding to a single soundscape with the same filename (other than the extension)
* Each JAMS file contains a signle annotation in scaper's custom sound_event namespace - installing scaper (pip install scaper)
  and importing it (import scaper) is required in order to load the annotation into python with jams (import jams):
  jam = jams.load('soundscape_train_bimodal0.jams').
* The value of each observation (sound event) is a dictionary storing all scaper-related sound event parameters:
    * label, source_file, source_time, event_time, event_duration, snr, role, pitch_shift, time_stretch.
    * Note: the event_duration stored in the value dictionary represents the specified duration prior to any time 
      stretching. The actual event durtation in the soundscape is stored in the duration field of the JAMS observation.
* The observations (sound events) in the JAMS annotation include both foreground sound events and the background(s).
* The probabilistic scaper foreground and background event specifications are stored in the annotation's sandbox, allowing
  a complete reconstruction of the soundscape audio from the JAMS annotation (assuming access to the original source material)
  using scaper.generate_from_jams('soundscape_train_bimodal0.jams').
* The annotation sandbox also includes additional metadata such as the total number of foreground sound events, the 
  maximum polyphony (sound event overlap) of the soundscape and its gini coefficient (a measure of soundscape complexity).

Simplified Annotations
------------------------------
* The simplified annotations are distributed as tab-separated text files.
* There are 10,000 simplified annotation files, each one corresponding to a single soundscape with the same filename (other than the extension)
* Each simplified annotation has a 3-column format (no header): start_time, end_time, label.
* Background sounds are NOT included in the simplified annotations (only foreground sound events)
* No additional information is stored in the simplified events (see the JAMS annotations for more details).

Please acknowledge this dataset in academic research

We would highly appreciate it if scientific publications of work partly based on URBAN-SED and/or scaper cite the aforementioned publication.

The creation of this dataset was supported by NSF award 1544753.

Files

Files (6.6 GB)

Name Size Download all
md5:e51ab426ccb256b6c5da4234afc07ae2
6.6 GB Download