DAACI-VoDAn Dataset
- 1. DAACI Ltd.
- 2. Time Machine Capital 2 Ltd.
Description
DAACI-VoDAn is a dataset for vocal detection that comprises manually-generated vocal activity annotations for 706 full-length music tracks, as well as their associated metadata including song title, artist, and YouTube URL. This data repository contains the metadata file and the annotations, all as CSV files.
The music tracks in DAACI-VoDAn were selected to cover a broad variety w.r.t. genre, era, and instrumentation, and are intended to be representative of the music universe available through commercial catalogues and streaming services.
Citing, license, and using the dataset
This dataset is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License and it is intended to be used only for research purposes.
DAACI-VoDAn is presented in the following paper, accepted for publication at EUSIPCO (European Signal Processing Conference) 2023. Please, cite this paper when using the dataset:
Helena Cuesta, Nadine Kroher, Aggelos Pikrakis, Stojan Djordjevic. DAACI-VoDAn: Improving Vocal Detection with New Data and Methods. To appear in Proceedings of the European Signal Processing Conference (EUSIPCO). 2023.
Annotation format
Vocal regions are provided as CSV files with a unique track ID as filename (matching the IDs specified in the metadata file). They were created manually by a team of four annotators following a peer-reviewing strategy, i.e., each annotation was created by one annotator and reviewed by another.
Each row in the CSV files represents a vocal region given by the tuple [start_time, duration] in seconds.
The task of annotating vocal segments deals with numerous ambiguous scenarios. To provide consistent and extendable annotations, we defined an annotation strategy in the form of a set of rules:
- We defined a minimum inter-region distance of 200 ms. Two vocal regions which are at least 200 ms apart are annotated as separate regions. Shorter vocal rests are not considered and a single long region is annotated instead.
- We annotated heavily processed (e.g., chorus or distortion effects) singing voice sections as vocals, as long as they are still recognisable as originating from a human voice.
- We extended vocal sections to include reverb and delay “tails" as long as they are audible. Similarly, we considered inhalations or “breathing" sounds, e.g., at the beginning of a vocal phrase, as part of vocal section.
- We considered segments containing speech as vocal sections but excluded whistling. Vocal ensembles and background vocals are also labelled as vocals.