Published February 3, 2025 | Version v1
Dataset Open

Piano Syllabus Dataset

  • 1. ROR icon Pompeu Fabra University
  • 2. ROR icon Sogang University
  • 3. ROR icon University of Alicante

Description

Title: Piano Syllabus Dataset (PSyllabus)

Authors: Pedro Ramoneda, Minhee Lee, Dasaem Jeong, Jose J. Valero-Mas, Xavier Serra

Description:
The Piano Syllabus Dataset (PSyllabus) is a curated collection of 7,901 solo piano recordings annotated with performance difficulty levels drawn from established music education syllabi (e.g., ABRSM, RCM, Trinity, etc.). The dataset spans 1,233 composers and 11 difficulty levels, ranging from beginner to advanced.

Each audio recording is paired with metadata specifying the composer, title, syllabus source, level, and other relevant information. The difficulty levels reflect real-world pedagogical standards and serve as a proxy for playability and technical complexity.

This dataset enables research on difficulty estimation directly from audio, as well as applications in music education, MIR, and automatic curriculum generation. We provide YouTube links to the performances, but if any of these links become unavailable, please contact Pedro Ramoneda to ensure replicability.

Format:

  • Audio: mp3 (via YouTube links)

  • MIDI, cqt and pianoroll files.
  • Metadata: new_clean_data.json

  • Canonical splits for training (train, validation, test): split_audio.json
  • License: Research use only

Related publication:
Pedro Ramoneda, Minhee Lee, Dasaem Jeong, Jose J. Valero-Mas, Xavier Serra, Can Audio Reveal Music Performance Difficulty? Insights From the Piano Syllabus Dataset, IEEE Transactions on Audio, Speech and Language Processing, vol. 33, pp. 1129–1141, 2025. DOI: 10.1109/TASLPRO.2025.3539018

@ARTICLE{10878288,
  author={Ramoneda, Pedro and Lee, Minhee and Jeong, Dasaem and Valero-Mas, Jose J. and Serra, Xavier},
  journal={IEEE Transactions on Audio, Speech and Language Processing}, 
  title={Can Audio Reveal Music Performance Difficulty? Insights From the Piano Syllabus Dataset}, 
  year={2025},
  volume={33},
  number={},
  pages={1129--1141},
  doi={10.1109/TASLPRO.2025.3539018},
  keywords={Estimation; Training; Acoustics; Benchmark testing; Multitasking; Speech processing; Prompt engineering; Hidden Markov models; Audio recording; Attention mechanisms; Music difficulty; music information retrieval; music technology education; performance analysis; playability}
}

Also available on arXiv: https://arxiv.org/abs/2403.03947

Contact:
For questions or broken Youtube links, please contact Pedro Ramoneda.

 

Files

cqt5.zip

Files (2.2 GB)

Name Size Download all
md5:5234c456665b0e168d0249ccb93dda7f
2.1 GB Preview Download
md5:8e6c4113c09f561336c688021e5cefa1
52.1 MB Preview Download
md5:26c61fabae340639e202803ad45efc8d
12.0 MB Preview Download
md5:32a8fb39efe4bec7010b1eaf89716587
76.7 MB Preview Download
md5:63091f9b8036cba2d83e499dab72ce21
2.3 MB Preview Download