There is a newer version of the record available.

Published September 21, 2025 | Version v1

Scaling Self-Supervised Representation Learning for Symbolic Piano Performance

Description

We study the capabilities of generative autoregressive transformer models trained on large amounts of symbolic solo-piano transcriptions. After first pre-training on approximately 60,000 hours of music, we use a comparatively smaller, high-quality subset, to fine-tune models to produce coherent musical generations, perform symbolic classification tasks, and by adapting the SimCLR framework to symbolic music, produce general purpose contrastive MIDI embeddings. The resulting models perform well on a variety of standard benchmarks, demonstrating the generalizability of the autoregressive representations learned during pre-training, often requiring only a few hundred gradient updates to fully specialize to different generative and MIR tasks.

Files

000052.pdf

Files (549.1 kB)

Name Size Download all
md5:a0a5568fe45a0c52579b5a8b4cd0f5a2
549.1 kB Preview Download