Published February 2, 2024 | Version 1.0.0
Model Open

A Data-Driven Analysis of Robust Automatic Piano Transcription

  • 1. ROR icon Queen Mary University of London
  • 2. ROR icon Yamaha (Japan)

Description

This is a re-trained model of [1], using the data augmentation techniques described in our pending IEEE Signal Processing Letters publication "A Data-Driven Analysis of Robust Automatic Piano Transcription".

MAPS test set out-of-dataset evaluation:

Model Precision Recall F1
Hawthorne et al. [2] 87.5 85.6 86.4
Kong et al. [1] 78.3 87.2 82.4
Maman and Bermano [3] 88.2 86.5 87.3
Toyama et al. [4] 84.6 85.7 85.1
Ours 89.5 87.4 88.4

 

On the MAESTRO test set, we acheive a note onset of 96.6 F1 score, compared to 96.7 of Kong et al. Previous publications ([1], [2], [4]) train without any data augmentation as it has been shown to slightly hurt test set performance. We view this as plain overfitting and encourage future research to focus more on generalization than on in-distribution test set metrics.


Note: this checkpoint does not include pedal predictions, so you should use the Regress_onset_offset_frame_velocity_CRNN module when loading the weights.

Source code of the original model implementation is available at: https://github.com/bytedance/piano_transcription

[1] Qiuqiang Kong, Bochen Li, Xuchen Song, Yuan Wan,and Yuxan Wang, “High-resolution piano transcription with pedals by regressing onset and offset times,” IEEE/ACM Transactions on Audio, Speech and Language Processing, vol. 29, pp. 3707–3717, 2021.
[2] Curtis Hawthorne, Andriy Stasyuk, Adam Roberts, Ian Simon, Cheng-Zhi Anna Huang, Sander Dieleman, Erich Elsen, Jesse Engel, and Douglas Eck, “Enabling factorized piano music modeling and generation with the MAESTRO dataset,” in International Conference on Learning Representations, 2019.
[3] Ben Maman and Amit H. Bermano, “Unaligned supervision for automatic music transcription in the wild,” in International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. 2022, vol. 162 of Proceedings of Machine Learning Research, pp. 14918–14934, PMLR.
[4] Keisuke Toyama, Taketo Akama, Yukara Ikemiya, Yuhta Takida, Wei-Hsiang Liao, and Yuki Mitsufuji, “Automatic piano transcription with hierarchical frequency-time transformer,” in Proceedings of the 24th International Society for Music Information Retrieval Conference, Milan, Italy, 2023, pp. 215–222.

Files

Files (103.8 MB)

Name Size Download all
md5:cd449c03d690e97dfe5d7c311ac8fa8c
103.8 MB Download