Published September 21, 2025
| Version v1
Conference paper
Open
Exploring Network Adaptations for Minimum Latency Real-Time Piano Transcription
Authors/Creators
Description
Advances in neural network design and the availability of large-scale labeled datasets have driven major improvements in piano transcription. Existing approaches target either offline applications, with no restrictions on computational demands, or online transcription, with delays of 160--320ms. However, most real-time musical applications require latencies below 30ms.
In this work, we investigate whether and how the current state-of-the-art online transcription model can be adapted for real-time piano transcription.
Specifically, we eliminate all non-causal processing, and reduce computational load through shared computations across core model components and variations in model size.
Additionally, we explore different pre- and postprocessing strategies, and related label encoding schemes, and discuss their suitability for real-time transcription.
Evaluating the adaptions on the MAESTRO dataset, we find a drop in transcription accuracy due to strictly causal processing as well as a tradeoff between the preprocessing latency and prediction accuracy.
We release our system as a baseline to support researchers in designing models towards minimum latency real-time transcription.
Files
000010.pdf
Files
(150.9 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:1e3456b2e15366a1d61989ddb6c29751
|
150.9 kB | Preview Download |