Multilingual lyrics-to-audio alignment

doi:10.5281/zenodo.4245484

Published October 11, 2020 | Version v1

Conference paper Open

Multilingual lyrics-to-audio alignment

Lyrics-to-audio alignment methods have recently reported impressive results, opening the door to practical applications such as karaoke and within song navigation. However, most studies focus on a single language - usually English - for which annotated data are abundant. The question of their ability to generalize to other languages, especially in low (or even zero) training resource scenarios has been so far left unexplored. In this paper, we address the lyrics-to-audio alignment task in a generalized multilingual setup. More precisely, this investigation presents the first (to the best of our knowledge) attempt to create a language-independent lyrics-to-audio alignment system. Building on a RNN model trained with a CTC algorithm, we study the relevance of different intermediate representations, either character or phoneme, along with several strategies to design a training set. The evaluation is conducted on multiple languages with a varying amount of data available, from plenty to zero. Results show that learning from diverse data and using a universal phoneme set as an intermediate representation yield the best generalization performances.

Files

101.pdf

Files (282.3 kB)

Name	Size	Download all
101.pdf md5:be302deb29b4a4e4c258c701845e1b41	282.3 kB	Preview Download

119

Views

Downloads

Show more details

	All versions	This version
Views	119	119
Downloads	90	90
Data volume	26.0 MB	26.0 MB

More info on how stats are collected....

DOI

Resource type

Conference paper

Publisher

ISMIR

Imprint

Proceedings of the 21st International Society for Music Information Retrieval Conference, 512-519. Montreal, Canada.

Conference

International Society for Music Information Retrieval Conference (ISMIR 2020) , Montreal, Canada, October 11-16, 2020

Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more

Technical metadata

Created: November 5, 2020
Modified: July 19, 2024

Multilingual lyrics-to-audio alignment

Creators

Description

Files

101.pdf

Files (282.3 kB)