Piano Syllabus Dataset

Ramoneda, Pedro; Lee, Minhee; Jeong, Dasaem; Valero-Mas, Jose J.; Serra, Xavier

doi:10.5281/zenodo.14794592

Published February 3, 2025 | Version v1

Dataset Open

Piano Syllabus Dataset

1. Pompeu Fabra University
2. Sogang University
3. University of Alicante

Title: Piano Syllabus Dataset (PSyllabus)

Authors: Pedro Ramoneda, Minhee Lee, Dasaem Jeong, Jose J. Valero-Mas, Xavier Serra

Description:
The Piano Syllabus Dataset (PSyllabus) is a curated collection of 7,901 solo piano recordings annotated with performance difficulty levels drawn from established music education syllabi (e.g., ABRSM, RCM, Trinity, etc.). The dataset spans 1,233 composers and 11 difficulty levels, ranging from beginner to advanced.

Each audio recording is paired with metadata specifying the composer, title, syllabus source, level, and other relevant information. The difficulty levels reflect real-world pedagogical standards and serve as a proxy for playability and technical complexity.

This dataset enables research on difficulty estimation directly from audio, as well as applications in music education, MIR, and automatic curriculum generation. We provide YouTube links to the performances, but if any of these links become unavailable, please contact Pedro Ramoneda to ensure replicability.

Format:

Audio: mp3 (via YouTube links)
MIDI, cqt and pianoroll files.
Metadata: new_clean_data.json
Canonical splits for training (train, validation, test): split_audio.json
License: Research use only

Related publication:
Pedro Ramoneda, Minhee Lee, Dasaem Jeong, Jose J. Valero-Mas, Xavier Serra, Can Audio Reveal Music Performance Difficulty? Insights From the Piano Syllabus Dataset, IEEE Transactions on Audio, Speech and Language Processing, vol. 33, pp. 1129–1141, 2025. DOI: 10.1109/TASLPRO.2025.3539018

@ARTICLE{10878288,
author={Ramoneda, Pedro and Lee, Minhee and Jeong, Dasaem and Valero-Mas, Jose J. and Serra, Xavier},
journal={IEEE Transactions on Audio, Speech and Language Processing},
title={Can Audio Reveal Music Performance Difficulty? Insights From the Piano Syllabus Dataset},
year={2025},
volume={33},
number={},
pages={1129--1141},
doi={10.1109/TASLPRO.2025.3539018},
keywords={Estimation; Training; Acoustics; Benchmark testing; Multitasking; Speech processing; Prompt engineering; Hidden Markov models; Audio recording; Attention mechanisms; Music difficulty; music information retrieval; music technology education; performance analysis; playability}
}

Also available on arXiv: https://arxiv.org/abs/2403.03947

Contact:
For questions or broken Youtube links, please contact Pedro Ramoneda.

Files

cqt5.zip

Files (2.2 GB)

Name	Size	Download all
cqt5.zip md5:5234c456665b0e168d0249ccb93dda7f	2.1 GB	Preview Download
mid.zip md5:8e6c4113c09f561336c688021e5cefa1	52.1 MB	Preview Download
new_clean_data.json md5:26c61fabae340639e202803ad45efc8d	12.0 MB	Preview Download
pianoroll5.zip md5:32a8fb39efe4bec7010b1eaf89716587	76.7 MB	Preview Download
split_audio.json md5:63091f9b8036cba2d83e499dab72ce21	2.3 MB	Preview Download

	All versions	This version
Views	103	103
Downloads	117	117
Data volume	46.8 GB	46.8 GB

Piano Syllabus Dataset

Creators

Description

Files

cqt5.zip

Files (2.2 GB)