Thesis Open Access
In this thesis, we devise computational models for tracking sung lyrics in multi-instrumental music recordings. We consider not only the low-level acoustic characteristics, representing the timbre of the sung phonemes, but also higher-level music knowledge, that is complementary to lyrics. We build probabilistic models, based on dynamic Bayesian networks (DBN) that represent the relation of phoneme transitions to two music knowledge facets: the temporal structure of a lyrics line and the structure of the metrical cycle. In one model we exploit the fact the expected syllable durations depend on their position within a lyrics line. Then in another model, we propose how to estimate vocal onsets by tracking simultaneously the position in the metrical cycle, and how these estimated onsets influence the transitions between consecutive phonemes. Using the proposed models sung lyrics are automatically aligned to written lyrics on datasets from Ottoman Turkish makam and Beijing opera, whereby principles, specific for these music traditions are considered. Both models improve a baseline, unaware of music-specific knowledge. This confirms that music-specific knowledge is an important stepping stone for computationally tracking lyrics, especially in the challenging case of singing with instrumental accompaniment.