Exploring Network Adaptations for Minimum Latency Real-Time Piano Transcription

Patricia Hu; Silvan Peter; Jan Schlüter; Gerhard Widmer

doi:10.5281/zenodo.17706339

There is a newer version of the record available.

Published September 21, 2025 | Version v1

Conference paper Open

Exploring Network Adaptations for Minimum Latency Real-Time Piano Transcription

Advances in neural network design and the availability of large-scale labeled datasets have driven major improvements in piano transcription. Existing approaches target either offline applications, with no restrictions on computational demands, or online transcription, with delays of 160--320ms. However, most real-time musical applications require latencies below 30ms. In this work, we investigate whether and how the current state-of-the-art online transcription model can be adapted for real-time piano transcription. Specifically, we eliminate all non-causal processing, and reduce computational load through shared computations across core model components and variations in model size. Additionally, we explore different pre- and postprocessing strategies, and related label encoding schemes, and discuss their suitability for real-time transcription. Evaluating the adaptions on the MAESTRO dataset, we find a drop in transcription accuracy due to strictly causal processing as well as a tradeoff between the preprocessing latency and prediction accuracy. We release our system as a baseline to support researchers in designing models towards minimum latency real-time transcription.

Files

000010.pdf

Files (150.9 kB)

Name	Size	Download all
000010.pdf md5:1e3456b2e15366a1d61989ddb6c29751	150.9 kB	Preview Download

130

Views

177

Downloads

Show more details

	All versions	This version
Views	130	88
Downloads	177	155
Data volume	29.3 MB	24.7 MB

More info on how stats are collected....

DOI

Resource type

Conference paper

Publisher

ISMIR

Imprint

Proceedings of the 26th International Society for Music Information Retrieval Conference, 97-104. Daejeon, South Korea.

Conference

International Society for Music Information Retrieval Conference (ISMIR 2025) , Daejeon, South Korea and Online, September 21-25, 2025

License: Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more

Technical metadata

Created: November 25, 2025
Modified: November 25, 2025

Exploring Network Adaptations for Minimum Latency Real-Time Piano Transcription

Authors/Creators

Description

Files

000010.pdf

Files (150.9 kB)