Title: MAESTRO Synthetic - Multiple Annotator Estimated STROng labels

MAESTRO Synthetic

Machine Listening Group, Tampere University https://research.tuni.fi/machinelistening/

Authors

1. Dataset

MAESTRO synthetic contains 20 synthetic audio files created using Scaper, each of them 3 minutes long. The dataset was created for studying annotation procedures for strong labels using crowdsourcing.

The audio files contain sounds from the following classes:

  • car_horn
  • children_voices
  • dog_bark
  • engine_idling
  • siren
  • street_music

Audio files contain excerpts of recordings uploaded to freesound.org. Please see FREESOUNDCREDITS.txt for an attribution list. 

Audio files are generated using Scaper, with small changes to the synthesis procedure: Sounds were placed at random intervals, controlling for a maximum polyphony of 2. Intervals between two consecutive events are selected at random, but limited to 2-10 seconds. Event classes and event instances are chosen uniformly, and mixed with a signal-to-noise ratio (SNR) randomly selected between 0 and 20 dB over a Brownian noise background. Having two overlapping events from the same class is avoided.

Annotation procedure

For annotation, each file of 3 minutes was split into 10-second segments, with a hop of one second. Each segment was annotated using crowdsourcing, in a tagging scenario.

For each segment, the annotators were required to select from the given list of classes the sounds active (audible). Each 10-s segment was annotated by five persons.

Full details on the annotation procedure and the processing of the tags can be found in:

Irene Martin Morato, Manu Harju, and Annamaria Mesaros. Crowdsourcing strong labels for sound event detection, in IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA 2021). New Paltz, NY, Oct 2021. 

Dataset content

The dataset contains: 

  • audio: the 20 synthetic soundscapes, each 3 min long
  • ground truth:  the "true" reference annotation created using Scaper, in jams (complete) and txt (simplified) format
  • raw annotations: complete data as annotated by multiple MTurk workers
  • estimated audio tags: tags per 10s segment, aggregated based on multiple opinions (MACE in the paper)
  • estimated strong labels: outcome of the method (MACE method in the paper)

Files correspondence

Each 3-minute file was split into 10-s segments with a 1-s hop. For example, scape_00.wav contains the segments 000000.wav - 000170.wav. The correspondence between them is as follows: 000000.wav starts at offset 0, 000001.wav starts at offset 1 s, 000002.wav starts at offset 2 s, etc

Full list:

  • scape_00: 000000.wav - 000170.wav
  • scape_01: 000171.wav - 000341.wav
  • scape_02: 000342.wav - 000512.wav
  • scape_03: 000513.wav - 000683.wav
  • scape_04: 000684.wav - 000854.wav
  • scape_05: 000855.wav - 001025.wav
  • scape_06: 001026.wav - 001196.wav
  • scape_07: 001197.wav - 001367.wav
  • scape_08: 001368.wav - 001538.wav
  • scape_09: 001539.wav - 001709.wav
  • scape_10: 001710.wav - 001880.wav
  • scape_11: 001881.wav - 002051.wav
  • scape_12: 002052.wav - 002222.wav
  • scape_13: 002223.wav - 002393.wav
  • scape_14: 002394.wav - 002564.wav
  • scape_15: 002565.wav - 002735.wav
  • scape_16: 002736.wav - 002906.wav
  • scape_17: 002907.wav - 003077.wav
  • scape_18: 003078.wav - 003248.wav
  • scape_19: 003249.wav - 003419.wav

File structure

dataset root
│   README.md                   this file
│   FREESOUNDCREDITS.txt        information on the individual sound examples used in the data
│   files_mapping.csv           mapping between freesound id and sound instances extracted from them, format file.wav [tab] label [tab] saliency [tab] freesound_id [tab] start_time [tab] end_time
│
└───audio                   
│   │   scape00.wav     
│   │   scape01.wav
│   │   ...
│
└───estimated_strong_labels         outcome of the method (using MACE)
│   │   mturk_scape00.csv           format: start_time [tab] end_time [tab] label
│   │   mturk_scape01.csv
│   │   ...
│
└───scaper_reference                ground truth created with Scaper (annotations, output from Scaper)
│   │   scape00.jams        
│   │   scape00.txt
│   │   scape01.jams        
│   │   scape01.txt
│   │   ...
└───tags                
│   │   MAESTRO_full_annotations.yaml   complete multi-annotator tags for all 10-s segments
│   │   MAESTRO_labels_mace100.csv      aggregated tags per segment, based on multiple annotations(using MACE); format: filename [tab]tag1,tag2,..

2. License

License permits free academic usage. Any commercial use is strictly prohibited. For commercial use, contact dataset authors.

Copyright (c) 2020 Tampere University and its licensors
All rights reserved.
Permission is hereby granted, without written agreement and without license or royalty
fees, to use and copy the MAESTRO Synthetic - Multi Annotator Estimated Strong Labels (“Work”) described in this document
and composed of audio and metadata. This grant is only for experimental and non-commercial
purposes, provided that the copyright notice in its entirety appear in all copies of this Work,
and the original source of this Work, (MAchine Listening Group at Tampere University),
is acknowledged in any publication that reports research using this Work.
Any commercial use of the Work or any part thereof is strictly prohibited.
Commercial use include, but is not limited to:
- selling or reproducing the Work
- selling or distributing the results or content achieved by use of the Work
- providing services by using the Work.

IN NO EVENT SHALL TAMPERE UNIVERSITY OR ITS LICENSORS BE LIABLE TO ANY PARTY
FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE
OF THIS WORK AND ITS DOCUMENTATION, EVEN IF TAMPERE UNIVERSITY OR ITS
LICENSORS HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

TAMPERE UNIVERSITY AND ALL ITS LICENSORS SPECIFICALLY DISCLAIMS ANY
WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
FITNESS FOR A PARTICULAR PURPOSE. THE WORK PROVIDED HEREUNDER IS ON AN "AS IS" BASIS, AND
THE TAMPERE UNIVERSITY HAS NO OBLIGATION TO PROVIDE MAINTENANCE, SUPPORT,
UPDATES, ENHANCEMENTS, OR MODIFICATIONS.