Thesis Open Access

Knowledge-based Probabilistic Modeling for Tracking Lyrics in Music Audio Signals

Dzhambazov, Georgi

Thesis supervisor(s)

Serra, Xavier

In this thesis, we devise computational models for tracking sung lyrics in multi-instrumental music recordings. We consider not only the low-level acoustic characteristics, representing the timbre of the sung phonemes, but also higher-level music knowledge, that is complementary to lyrics. We build probabilistic models, based on dynamic Bayesian networks (DBN) that represent the relation of phoneme transitions to two music knowledge facets: the temporal structure of a lyrics line and the structure of the metrical cycle. In one model we exploit the fact the expected syllable durations depend on their position within a lyrics line. Then in another model, we propose how to estimate vocal onsets by tracking simultaneously the position in the metrical cycle, and how these estimated onsets influence the transitions between consecutive phonemes. Using the proposed models sung lyrics are automatically aligned to written lyrics on datasets from Ottoman Turkish makam and Beijing opera, whereby principles, specific for these music traditions are considered. Both models improve a baseline, unaware of music-specific knowledge. This confirms that music-specific knowledge is an important stepping stone for computationally tracking lyrics, especially in the challenging case of singing with instrumental accompaniment.

Funded byCompMusic project ( European Research Council Grant grant agreement 267583) and the Catalan Scholarship by the Agència de Gestió d'Ajuts Universitaris i de Recerca (AGAUR)
Files (6.0 MB)
Name Size
6.0 MB Download


Cite as