PianoBind: A Multi-Modal Joint Embedding Model for Pop-Piano Music

Hayeon Bang; Eunjin Choi; Seungheon Doh; Juhan Nam

doi:10.5281/zenodo.17706422

There is a newer version of the record available.

Published September 21, 2025 | Version v1

Conference paper Open

PianoBind: A Multi-Modal Joint Embedding Model for Pop-Piano Music

Solo piano music, despite being a single-instrument medium, possesses significant expressive capabilities, conveying rich semantic information across genres, moods, and styles. However, current general-purpose music representation models, predominantly trained on large-scale datasets, often struggle to captures subtle semantic distinctions within homogeneous solo piano music. Furthermore, existing piano-specific representation models are typically unimodal, failing to capture the inherently multimodal nature of piano music, expressed through audio, symbolic, and textual modalities. To address these limitations, we propose PianoBind, a piano-specific multimodal joint embedding model. We systematically investigate strategies for multi-source training and modality utilization within a joint embedding framework optimized for capturing fine-grained semantic distinctions in (1) small-scale and (2) homogeneous piano datasets. Our experimental results demonstrate that PianoBind learns multimodal representations that effectively capture subtle nuances of piano music, achieving superior text-to-music retrieval performance on in-domain and out-of-domain piano datasets compared to general-purpose music joint embedding models. Moreover, our design choices offer reusable insights for multimodal representation learning with homogeneous datasets beyond piano music.

Files

000045.pdf

Files (315.4 kB)

Name	Size	Download all
000045.pdf md5:740153a0841f91d5193887feac7f5643	315.4 kB	Preview Download

166

Views

Downloads

Show more details

	All versions	This version
Views	166	96
Downloads	96	77
Data volume	34.4 MB	27.8 MB

More info on how stats are collected....

DOI

Resource type

Conference paper

Publisher

ISMIR

Imprint

Proceedings of the 26th International Society for Music Information Retrieval Conference, 405-412. Daejeon, South Korea.

Conference

International Society for Music Information Retrieval Conference (ISMIR 2025) , Daejeon, South Korea and Online, September 21-25, 2025

License: Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more

Technical metadata

Created: November 25, 2025
Modified: November 25, 2025

PianoBind: A Multi-Modal Joint Embedding Model for Pop-Piano Music

Authors/Creators

Description

Files

000045.pdf

Files (315.4 kB)