Published June 17, 2019 | Version 1.1
Dataset Open

MSMD - Multimodal Sheet Music Dataset

  • 1. Johannes Kepler University Linz
  • 2. Charles University Prague

Contributors

  • 1. Johannes Kepler University Linz

Description

MSMD is a synthetic dataset of 497 pieces of (classical) music that contains both audio and score representations of the pieces aligned at a fine-grained level (344,742 pairs of noteheads aligned to their audio/MIDI counterpart). It can be used for training and evaluating multimodal models that enable crossing from one modality to the other, such as retrieving sheet music using recordings or following a performance in the score image.

Please find further information and a corresponding Python package on this Github page: https://github.com/CPJKU/msmd

If you use this dataset, please cite:
[1] Matthias Dorfer, Jan Hajič jr., Andreas Arzt, Harald Frostel, Gerhard Widmer.
Learning Audio-Sheet Music Correspondences for Cross-Modal Retrieval and Piece Identification (PDF).
Transactions of the International Society for Music Information Retrieval, issue 1, 2018.

Files

msmd_aug_v1-1_no-audio.zip

Files (9.6 GB)

Name Size Download all
md5:cf843c481c2ff811d9b26c6ca7df60ab
9.6 GB Preview Download

Additional details

Funding

Con Espressione – Getting at the Heart of Things: Towards Expressivity-aware Computer Systems in Music 670035
European Commission