BPSD: A Coherent Multi-Version Dataset for Analyzing the First Movements of Beethoven's Piano Sonatas
Creators
Description
-- Full paper: https://doi.org/10.5334/tismir.196 --
This repository contains the Beethoven Piano Sonata Dataset (BPSD), a multi-version dataset focusing on the first movements of Beethoven's 32 piano sonatas. Recognized as pivotal works in classical music, Beethoven's piano sonatas have profoundly shaped Western classical music, holding a significant place in cultural history.
The BPSD includes sheet music in different machine-readable formats and audio recordings from eleven performances, with four of them being in the public domain and freely accessible for research purposes. A key feature of BPSD is its coherence, ensuring alignment of all versions on a unified musical timeline and enforcing consistent structures through careful editing of both score and audio representations.
The focus and main motivation for the design choices made in BPSD are on the technical and computational level. In particular, BPSD facilitates the assessment of algorithmic approaches in tasks like harmony analysis, structure analysis, music transcription, beat and downbeat estimation, and score following. The dataset's coherence makes it an ideal platform for systematically training and evaluating deep learning methods, shedding light on their robustness and uncovering data biases across different data splits using cross-version strategies for evaluation.
To ease applicability for computational approaches, the BPSD is based on various simplifications that may be disputable from a musicological perspective. Rather than providing novel musicological annotations, the main conceptual contribution of BPSD with its measure annotations is to provide a framework for transferring existing annotations from the symbolic to the audio domain. We hope that, as such, BPSD is also useful for the systematic analysis and exploration of Beethoven's piano sonatas, providing insights into their influence on the development of harmony and structure in Western classical music. Beyond research applications, the dataset also holds educational potential, aiding in the preparation and presentation of Beethoven's work to a broader audience through interactive multimedia experiences.
Table of contents
- 0_RawData |
Raw audio and symbolic data |
| - audio_ripped | Audio files as ripped from the CD |
| - AS35 | Recordings by Artur Schnabel |
| - FG58 | Recordings by Friedrich Gulda |
| - FJ62 | Recordings by Fritz Jank |
| - WK64 | Recordings by Wilhelm Kempff |
| - score_pdf_scan | Scanned score from IMSLP |
| - score_pdf_repetitions | Symbolic score in PDF format with repeat signs |
| - score_pdf_unfolded | Symbolic score in PDF format with unfolded repetitions |
| - score_sibelius_repetitions | Symbolic score in Sibelius format with repeat signs |
| - score_sibelius_unfolded | Symbolic score in Sibelius format with unfolded repetitions |
| - score_xml_repetitions | Symbolic score in MusicXML format with repeat signs |
| - score_xml_unfolded | Symbolic score in MusicXML format with unfolded repetitions |
| - score_midi | MIDI export of the symbolic score |
- 1_Audio | Audio files with coherent structure |
- 2_Annotations | Annotations with musical and physical timelines |
| - ann_score_note | Note events with start and end given in musical time |
| - ann_score_chord | Harmony annotations given in musical time |
| - ann_score_localkey | Local key annotations given in musical time |
| - ann_score_globalkey | Global key annotations |
| - ann_score_structureFine | Fine structure annotations given in musical time |
| - ann_score_structureCoarse | Coarse structure annotations given in musical time |
| - ann_audio_note | Note events with start and end given in physical time |
| - ann_audio_midi | Note events in physical time in MIDI format |
| - ann_audio_beat | Beat annotations given in physical time |
| - ann_audio_measure | Measure annotations given in physical time |
| - ann_audio_startEnd | Start and end of audio recordings (for removing silence/applause) given in physical time |
| - ann_audio_syncInfo | Alignment tuples for converting between musical and physical timeline |
| - ann_audio_modifications | Annotations for structural modifications of recordings |
| - ann_audio_chord | Harmony annotations given in physical time |
| - ann_audio_localkey | Local key annotations given in physical time |
| - ann_audio_structureFine | Fine structure annotations given in physical time |
| - ann_audio_structureCoarse | Coarse structure annotations given in physical time |
- 3_Scripts | Phyton scripts to convert raw data into the structured format. Maintained code is available on GitHub |
Other
Audio Versions
ID | Performer | Recording Year | Label | Release year | EAN code | MusicBrainz ReleaseID |
AS35 | Artur Schnabel | 1935 | Warner Classics | 2016 | 0190295975050 | 7bd7338c-2acc-49f4-b262-122085a3e694 |
FG58 | Friedrich Gulda | 1958 | Decca | 1958 | 028948514519 | n.a. |
FJ62 | Fritz Jank | 1962 | Instituto Piano Brasileiro | 2021 | n.a. | available at IMSLP |
WK64 | Wilhelm Kempff | 1964 | Deutsche Grammophon | 1995 | 028944796629 | 38864449-d1e9-4b4f-b5a6-e73acc954e27 |
FG67 | Friedrich Gulda | 1967 | Amadeo/Decca | 1968 | 028947687610 | 83f869ea-fc64-4fe9-b424-52d4282f706f |
VA81 | Vladimir Ashkenazy | 1981 | London Records | 1995 | 028944370621 | 36fcb34f-59ab-3e4d-a066-3067ed82ed33 |
DB84 | Daniel Barenboim | 1984 | Deutsche Grammophon | 1984 | 028941375926, 028941376626 | 261a38ba-9c56-458e-9a4d-c7b6b4acb3a3, b4b49c3d-f86a-4701-b967-d4e726ab8ef0 |
JJ90 | Jeno Jando | 1990 | NAXOS | 1990 | 730099150224 | 2f94e0a3-be66-4894-9c9d-83d5890081da |
AB96 | Alfred Brendel | 1996 | Philips | 1996 | 028941257529 | 6f419224-c6fb-4c38-871e-5799b755a387 |
MB97 | Malcolm Bilson et al. | 1997 | Claves | 1997 | 7619931970721 | 718bac94-7c2c-48ea-8f30-dc230aab019d |
MC22 | Muriel Chemin | 2022 | Odradek | 2022 | 855317003615 | n.a. |
Notes
Technical info
Known Problems
- Cadenza missing in Op002No3-01
- The cadenza in Op002No3-01, measure 322, is not included in the sibelius files, XML files, MIDI files, ann_score_note, and ann_audio_note
- Missing measures in Op057-01_FJ62
- Jank's recording deviates from the notated score in Op057-01, starting at measure 86.5 and continuing for 2 more measures. He then directly jumps to measure 93, where the playing is correct again. In summary, 2 measures deviate from the score and 4.5 measures are missing completely.
Files
Beethoven_Piano_Sonata_Dataset_v2.zip
Files
(2.1 GB)
Name | Size | Download all |
---|---|---|
md5:17f73a2d608fd6e6da65d3bd9dda1923
|
2.1 GB | Preview Download |
Additional details
References
- Johannes Zeitler, Christof Weiß, Vlora Arifi-Müller, and Meinard Müller. (2024). "BPSD: A Coherent Multi-Version Dataset for Analyzing the First Movements of Beethoven's Piano sonatas." Transactions of the International Society for Music Information Retrieval (TISMIR), 7(1), 195-212..