DALI: A Large Dataset of Synchronized Audio, Lyrics and notes, Automatically Created using Teacher-student Machine Learning Paradigm.

doi:10.5281/zenodo.1492443

Published September 23, 2018 | Version v1

Conference paper Open

DALI: A Large Dataset of Synchronized Audio, Lyrics and notes, Automatically Created using Teacher-student Machine Learning Paradigm.

The goal of this paper is twofold. First, we introduce DALI, a large and rich multimodal dataset containing 5358 audio tracks with their time-aligned vocal melody notes and lyrics at four levels of granularity. The second goal is to explain our methodology where dataset creation and learning models interact using a teacher-student machine learning paradigm that benefits each other. We start with a set of manual annotations of draft time-aligned lyrics and notes made by non-expert users of Karaoke games. This set comes without audio. Therefore, we need to find the corresponding audio and adapt the annotations to it. To that end, we retrieve audio candidates from the Web. Each candidate is then turned into a singing-voice probability over time using a teacher, a deep convolutional neural network singing-voice detection system (SVD), trained on cleaned data. Comparing the time-aligned lyrics and the singing-voice probability, we detect matches and update the time-alignment lyrics accordingly. From this, we obtain new audio sets. They are then used to train new SVD students used to perform again the above comparison. The process could be repeated iteratively. We show that this allows to progressively improve the performances of our SVD and get better audiomatching and alignment.

Files

35_Paper.pdf

Files (1.2 MB)

Name	Size	Download all
35_Paper.pdf md5:0e959ffd3e7fa1b16604b0d4034d8cf3	1.2 MB	Preview Download

985

Views

605

Downloads

Show more details

	All versions	This version
Views	985	977
Downloads	605	601
Data volume	818.7 MB	813.7 MB

More info on how stats are collected....

DOI

Resource type

Conference paper

Publisher

ISMIR

Imprint

Proceedings of the 19th International Society for Music Information Retrieval Conference, 431-437. Paris, France.

Conference

International Society for Music Information Retrieval Conference (ISMIR 2018) , Paris, France, September 23-27, 2018

Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more

Technical metadata

Created: November 20, 2018
Modified: August 2, 2024

DALI: A Large Dataset of Synchronized Audio, Lyrics and notes, Automatically Created using Teacher-student Machine Learning Paradigm.

Creators

Description

Files

35_Paper.pdf

Files (1.2 MB)