PianoVAM: A Multimodal Piano Performance Dataset

Yonghyun Kim; Junhyung Park; Joonhyung Bae; Kirak Kim; Taegyun Kwon; Alexander Lerch; Juhan Nam

doi:10.5281/zenodo.17706504

There is a newer version of the record available.

Published September 21, 2025 | Version v1

Conference paper Open

PianoVAM: A Multimodal Piano Performance Dataset

The multimodal nature of music performance has driven increasing interest in data beyond the audio domain within the music information retrieval (MIR) community. This paper introduces PianoVAM, a comprehensive piano performance dataset that includes videos, audio, MIDI, hand landmarks, fingering labels, and rich metadata. The dataset was recorded using a Disklavier piano, capturing audio and MIDI from amateur pianists during their daily practice sessions, alongside synchronized top-view videos in realistic and varied performance conditions. Hand landmarks and fingering labels were extracted using a pretrained hand pose estimation model and a semi-automated fingering detection algorithm. We discuss the challenges encountered during data collection and the alignment process across different modalities. Additionally, we describe our fingering detection method based on hand landmarks extracted from videos. Finally, we present experimental results on both audio-only and audio-visual piano transcription using the PianoVAM dataset for benchmarking purposes and discuss other potential applications.

Files

000061.pdf

Files (334.2 kB)

Name	Size	Download all
000061.pdf md5:bc5ad12f4cb457164eba5befc7847ab0	334.2 kB	Preview Download

197

Views

250

Downloads

Show more details

	All versions	This version
Views	197	112
Downloads	250	170
Data volume	88.2 MB	59.8 MB

More info on how stats are collected....

DOI

Resource type

Conference paper

Publisher

ISMIR

Imprint

Proceedings of the 26th International Society for Music Information Retrieval Conference, 542-549. Daejeon, South Korea.

Conference

International Society for Music Information Retrieval Conference (ISMIR 2025) , Daejeon, South Korea and Online, September 21-25, 2025

License: Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more

Technical metadata

Created: November 25, 2025
Modified: November 25, 2025

PianoVAM: A Multimodal Piano Performance Dataset

Authors/Creators

Description

Files

000061.pdf

Files (334.2 kB)