Conference paper Open Access

Learning Pitch-Class Representations from Score-Audio Pairs of Classical Music

Christof Weiss; Johannes Zeitler; Tim Zunner; Florian Schuberth; Meinard Müller

Chroma or pitch-class representations of audio recordings are an essential tool in music information retrieval. Traditional chroma features relying on signal processing are often influenced by timbral properties such as overtones or vibrato and, thus, only roughly correspond to the pitch classes indicated by a score. Deep learning provides a promising possibility to overcome such problems but requires large annotated datasets. Previous approaches therefore use either synthetic audio, MIDI-piano recordings, or chord annotations for training. Since these strategies have different limitations, we propose to learn transcription-like pitch-class representations using pre-synchronized score-audio pairs of classical music. We train several CNNs with musically inspired architectures and evaluate their pitch-class estimates for various instrumentations including orchestra, piano, chamber music, and singing. Moreover, we illustrate the learned features' behavior when used as input to a chord recognition system. In all our experiments, we compare cross-validation with cross-dataset evaluation. Obtaining promising results, our strategy shows how to leverage the power of deep learning for constructing robust but interpretable tonal representations.
Files (2.3 MB)
Name Size
2.3 MB Download
All versions This version
Views 201201
Downloads 109109
Data volume 249.4 MB249.4 MB
Unique views 182182
Unique downloads 9898


Cite as