Published July 19, 2024 | Version v2
Dataset Open

BPSD: A Coherent Multi-Version Dataset for Analyzing the First Movements of Beethoven's Piano Sonatas

  • 1. ROR icon International Audio Laboratories Erlangen
  • 2. ROR icon University of Würzburg

Description

-- Full paper: https://doi.org/10.5334/tismir.196 --

This repository contains the Beethoven Piano Sonata Dataset (BPSD), a multi-version dataset focusing on the first movements of Beethoven's 32 piano sonatas. Recognized as pivotal works in classical music, Beethoven's piano sonatas have profoundly shaped Western classical music, holding a significant place in cultural history.

The BPSD includes sheet music in different machine-readable formats and audio recordings from eleven performances, with four of them being in the public domain and freely accessible for research purposes. A key feature of BPSD is its coherence, ensuring alignment of all versions on a unified musical timeline and enforcing consistent structures through careful editing of both score and audio representations.

The focus and main motivation for the design choices made in BPSD are on the technical and computational level. In particular, BPSD facilitates the assessment of algorithmic approaches in tasks like harmony analysis, structure analysis, music transcription, beat and downbeat estimation, and score following. The dataset's coherence makes it an ideal platform for systematically training and evaluating deep learning methods, shedding light on their robustness and uncovering data biases across different data splits using cross-version strategies for evaluation. 

To ease applicability for computational approaches, the BPSD is based on various simplifications that may be disputable from a musicological perspective. Rather than providing novel musicological annotations, the main conceptual contribution of BPSD with its measure annotations is to provide a framework for transferring existing annotations from the symbolic to the audio domain. We hope that, as such, BPSD is also useful for the systematic analysis and exploration of Beethoven's piano sonatas, providing insights into their influence on the development of harmony and structure in Western classical music. Beyond research applications, the dataset also holds educational potential, aiding in the preparation and presentation of Beethoven's work to a broader audience through interactive multimedia experiences.

Table of contents

- 0_RawData

Raw audio and symbolic data
| - audio_ripped Audio files as ripped from the CD
  | - AS35 Recordings by Artur Schnabel
  | - FG58 Recordings by Friedrich Gulda
  | - FJ62 Recordings by Fritz Jank
  | - WK64 Recordings by Wilhelm Kempff
| - score_pdf_scan Scanned score from IMSLP
| - score_pdf_repetitions Symbolic score in PDF format with repeat signs
| - score_pdf_unfolded Symbolic score in PDF format with unfolded repetitions
| - score_sibelius_repetitions Symbolic score in Sibelius format with repeat signs
| - score_sibelius_unfolded Symbolic score in Sibelius format with unfolded repetitions
| - score_xml_repetitions Symbolic score in MusicXML format with repeat signs
| - score_xml_unfolded Symbolic score in MusicXML format with unfolded repetitions
| - score_midi MIDI export of the symbolic score
- 1_Audio Audio files with coherent structure
- 2_Annotations Annotations with musical and physical timelines
| - ann_score_note Note events with start and end given in musical time
| - ann_score_chord Harmony annotations given in musical time
| - ann_score_localkey Local key annotations given in musical time
| - ann_score_globalkey Global key annotations
| - ann_score_structureFine Fine structure annotations given in musical time
| - ann_score_structureCoarse Coarse structure annotations given in musical time
| - ann_audio_note Note events with start and end given in physical time
| - ann_audio_midi Note events in physical time in MIDI format
| - ann_audio_beat Beat annotations given in physical time
| - ann_audio_measure Measure annotations given in physical time
| - ann_audio_startEnd Start and end of audio recordings (for removing silence/applause) given in physical time
| - ann_audio_syncInfo Alignment tuples for converting between musical and physical timeline
| - ann_audio_modifications Annotations for structural modifications of recordings
| - ann_audio_chord Harmony annotations given in physical time
| - ann_audio_localkey Local key annotations given in physical time
| - ann_audio_structureFine Fine structure annotations given in physical time
| - ann_audio_structureCoarse Coarse structure annotations given in physical time
- 3_Scripts Phyton scripts to convert raw data into the structured format. Maintained code is available on GitHub

Other

Audio Versions

ID Performer Recording Year Label Release year EAN code MusicBrainz ReleaseID
AS35 Artur Schnabel 1935 Warner Classics 2016 0190295975050 7bd7338c-2acc-49f4-b262-122085a3e694
FG58 Friedrich Gulda 1958 Decca 1958 028948514519 n.a.
FJ62 Fritz Jank 1962 Instituto Piano Brasileiro 2021 n.a. available at IMSLP
WK64 Wilhelm Kempff 1964 Deutsche Grammophon 1995 028944796629 38864449-d1e9-4b4f-b5a6-e73acc954e27
FG67 Friedrich Gulda 1967 Amadeo/Decca 1968 028947687610 83f869ea-fc64-4fe9-b424-52d4282f706f
VA81 Vladimir Ashkenazy 1981 London Records 1995 028944370621 36fcb34f-59ab-3e4d-a066-3067ed82ed33
DB84 Daniel Barenboim 1984 Deutsche Grammophon 1984 028941375926, 028941376626 261a38ba-9c56-458e-9a4d-c7b6b4acb3a3, b4b49c3d-f86a-4701-b967-d4e726ab8ef0
JJ90 Jeno Jando 1990 NAXOS 1990 730099150224 2f94e0a3-be66-4894-9c9d-83d5890081da
AB96 Alfred Brendel 1996 Philips 1996 028941257529 6f419224-c6fb-4c38-871e-5799b755a387
MB97 Malcolm Bilson et al. 1997 Claves 1997 7619931970721 718bac94-7c2c-48ea-8f30-dc230aab019d
MC22 Muriel Chemin 2022 Odradek 2022 855317003615 n.a.

Notes

This work was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) within the project "Computational Analysis of Harmonic Structures" under No. 252013209 (MU 2686/7-2), the project "Learning with Music Signals: Technology Meets Education" under No. 500643750 (MU 2686/15-1), and the Emmy Noether research group "Computational Analysis of Music Audio Recordings: A Cross-Version Approach" under No. 531250483 (WE 6611/3-1). We express our gratitude to all team members, student assistants, and colleagues who played pivotal roles in the data curation process and annotation work. Special mentions among the numerous contributors include Harald Grohganz, Celina Hüttner, Nanzhu Jiang, and Michael Kohl. The International Audio Laboratories Erlangen are a joint institution of the Friedrich-Alexander-Universität Erlangen-Nüurnberg (FAU) and Fraunhofer Institute for Integrated Circuits IIS.

Technical info

Known Problems

  • Cadenza missing in Op002No3-01
    • The cadenza in Op002No3-01, measure 322, is not included in the sibelius files, XML files, MIDI files, ann_score_note, and ann_audio_note
  • Missing measures in Op057-01_FJ62
    • Jank's recording deviates from the notated score in Op057-01, starting at measure 86.5 and continuing for 2 more measures. He then directly jumps to measure 93, where the playing is correct again. In summary, 2 measures deviate from the score and 4.5 measures are missing completely.

Files

Beethoven_Piano_Sonata_Dataset_v2.zip

Files (2.1 GB)

Name Size Download all
md5:17f73a2d608fd6e6da65d3bd9dda1923
2.1 GB Preview Download

Additional details

References

  • Johannes Zeitler, Christof Weiß, Vlora Arifi-Müller, and Meinard Müller. (2024). "BPSD: A Coherent Multi-Version Dataset for Analyzing the First Movements of Beethoven's Piano sonatas." Transactions of the International Society for Music Information Retrieval (TISMIR), 7(1), 195-212..