Piano Syllabus Dataset
Description
Title: Piano Syllabus Dataset (PSyllabus)
Authors: Pedro Ramoneda, Minhee Lee, Dasaem Jeong, Jose J. Valero-Mas, Xavier Serra
Description:
The Piano Syllabus Dataset (PSyllabus) is a curated collection of 7,901 solo piano recordings annotated with performance difficulty levels drawn from established music education syllabi (e.g., ABRSM, RCM, Trinity, etc.). The dataset spans 1,233 composers and 11 difficulty levels, ranging from beginner to advanced.
Each audio recording is paired with metadata specifying the composer, title, syllabus source, level, and other relevant information. The difficulty levels reflect real-world pedagogical standards and serve as a proxy for playability and technical complexity.
This dataset enables research on difficulty estimation directly from audio, as well as applications in music education, MIR, and automatic curriculum generation. We provide YouTube links to the performances, but if any of these links become unavailable, please contact Pedro Ramoneda to ensure replicability.
Format:
-
Audio: mp3 (via YouTube links)
- MIDI, cqt and pianoroll files.
-
Metadata:
new_clean_data.json - Canonical splits for training (train, validation, test):
split_audio.json -
License: Research use only
Related publication:
Pedro Ramoneda, Minhee Lee, Dasaem Jeong, Jose J. Valero-Mas, Xavier Serra, Can Audio Reveal Music Performance Difficulty? Insights From the Piano Syllabus Dataset, IEEE Transactions on Audio, Speech and Language Processing, vol. 33, pp. 1129–1141, 2025. DOI: 10.1109/TASLPRO.2025.3539018
@ARTICLE{10878288, author={Ramoneda, Pedro and Lee, Minhee and Jeong, Dasaem and Valero-Mas, Jose J. and Serra, Xavier}, journal={IEEE Transactions on Audio, Speech and Language Processing}, title={Can Audio Reveal Music Performance Difficulty? Insights From the Piano Syllabus Dataset}, year={2025}, volume={33}, number={}, pages={1129--1141}, doi={10.1109/TASLPRO.2025.3539018}, keywords={Estimation; Training; Acoustics; Benchmark testing; Multitasking; Speech processing; Prompt engineering; Hidden Markov models; Audio recording; Attention mechanisms; Music difficulty; music information retrieval; music technology education; performance analysis; playability}}
Also available on arXiv: https://arxiv.org/abs/2403.03947
Contact:
For questions or broken Youtube links, please contact Pedro Ramoneda.
Files
cqt5.zip
Files
(2.2 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:5234c456665b0e168d0249ccb93dda7f
|
2.1 GB | Preview Download |
|
md5:8e6c4113c09f561336c688021e5cefa1
|
52.1 MB | Preview Download |
|
md5:26c61fabae340639e202803ad45efc8d
|
12.0 MB | Preview Download |
|
md5:32a8fb39efe4bec7010b1eaf89716587
|
76.7 MB | Preview Download |
|
md5:63091f9b8036cba2d83e499dab72ce21
|
2.3 MB | Preview Download |