MAESTRO Real - Multi-Annotator Estimated Strong Labels
Description
The dataset was created for studying estimation of strong labels using crowdsourcing.
It contains 49 real-life audio files from 5 different acoustic scenes, and the annotation outcome. Annotation was performed using Amazon Mechanical Turk. Total duration of the dataset is 189 minutes and 52 seconds
Audio files are a subset from TUT Acoustic Scenes 2016 dataset, belonging to five acoustic scenes: cafe/restaurant, city center, grocery store, metro station and residential area. Each scene have 6 classes, some of them are common to all the scenes, resulting into 17 classes in total.
The dataset contains:
- audio: the 49 real-life recordings, each from 3 to 5 min long.
- soft labels: estimated strong labels from the crowdsourced data, values between 0 and 1 indicates the uncertainty of the annotators.
For more details about the real-life recordings, please see the following paper:
A. Mesaros, T. Heittola and T. Virtanen, "TUT database for acoustic scene classification and sound event detection," 2016 24th European Signal Processing Conference (EUSIPCO), 2016, pp. 1128-1132.
Files
development_annotation.zip
Additional details
Funding
- Teaching machines to listen 332063
- Research Council of Finland