Published May 31, 2023 | Version 1.0
Dataset Restricted

DAACI-VoDAn Dataset

  • 1. DAACI Ltd.
  • 2. Time Machine Capital 2 Ltd.

Description

DAACI-VoDAn is a dataset for vocal detection that comprises manually-generated vocal activity annotations for 706 full-length music tracks, as well as their associated metadata including song title, artist, and YouTube URL. This data repository contains the metadata file and the annotations as CSV files, and a pickle file with the proposed data partitions: train (80%), validation (10%), and test (10%).

The music tracks in DAACI-VoDAn were selected to cover a broad variety w.r.t. genre, era, and instrumentation, and are intended to be representative of the music universe available through commercial catalogues and streaming services. 

Citing, license, and using the dataset

This dataset is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License and it is intended to be used only for research purposes.

DAACI-VoDAn is presented in the following paper, accepted for publication at EUSIPCO (European Signal Processing Conference) 2023. Please, cite this paper when using the dataset:

Helena Cuesta, Nadine Kroher, Aggelos Pikrakis, Stojan Djordjevic. DAACI-VoDAn: Improving Vocal Detection with New Data and Methods. To appear in Proceedings of the European Signal Processing Conference (EUSIPCO). 2023.

Annotation format

Vocal regions are provided as CSV files with a unique track ID as filename (matching the IDs specified in the metadata file). They were created manually by a team of four annotators following a peer-reviewing strategy, i.e., each annotation was created by one annotator and reviewed by another.

Each row in the CSV files represents a vocal region given by the tuple [start_time, duration] in seconds.

The task of annotating vocal segments deals with numerous ambiguous scenarios. To provide consistent and extendable annotations, we defined an annotation strategy in the form of a set of rules: 

  • We defined a minimum inter-region distance of 200 ms. Two vocal regions which are at least 200 ms apart are annotated as separate regions. Shorter vocal rests are not considered and a single long region is annotated instead.
  • We annotated heavily processed (e.g., chorus or distortion effects) singing voice sections as vocals, as long as they are still recognisable as originating from a human voice.
  • We extended vocal sections to include reverb and delay “tails" as long as they are audible. Similarly, we considered inhalations or “breathing" sounds, e.g., at the beginning of a vocal phrase, as part of vocal section.
  • We considered segments containing speech as vocal sections but excluded whistling. Vocal ensembles and background vocals are also labelled as vocals.

 

Files

Restricted

The record is publicly accessible, but files are restricted to users with access.

Request access

If you would like to request access to these files, please fill out the form below.

You need to satisfy these conditions in order for this request to be accepted:

DAACI-VoDAn dataset is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Hence, access to data files will be granted for research purposes only. Please, when requesting the dataset briefly summarise your project/how the dataset will be used.

Please, cite the following paper if using the dataset:

Helena Cuesta, Nadine Kroher, Aggelos Pikrakis, Stojan Djordjevic. DAACI-VoDAn: Improving Vocal Detection with New Data and Methods. To appear in Proceedings of the European Signal Processing Conference (EUSIPCO). 2023

You are currently not logged in. Do you have an account? Log in here