A Data-Driven Analysis of Robust Automatic Piano Transcription
Authors/Creators
Description
This is a re-trained model of [1], using the data augmentation techniques described in our pending IEEE Signal Processing Letters publication "A Data-Driven Analysis of Robust Automatic Piano Transcription".
MAPS test set out-of-dataset evaluation:
| Model | Precision | Recall | F1 |
| Hawthorne et al. [2] | 87.5 | 85.6 | 86.4 |
| Kong et al. [1] | 78.3 | 87.2 | 82.4 |
| Maman and Bermano [3] | 88.2 | 86.5 | 87.3 |
| Toyama et al. [4] | 84.6 | 85.7 | 85.1 |
| Ours | 89.5 | 87.4 | 88.4 |
On the MAESTRO test set, we acheive a note onset of 96.6 F1 score, compared to 96.7 of Kong et al. Previous publications ([1], [2], [4]) train without any data augmentation as it has been shown to slightly hurt test set performance. We view this as plain overfitting and encourage future research to focus more on generalization than on in-distribution test set metrics.
Note: this checkpoint does not include pedal predictions, so you should use the Regress_onset_offset_frame_velocity_CRNN module when loading the weights.
Source code of the original model implementation is available at: https://github.com/bytedance/piano_transcription
[1] Qiuqiang Kong, Bochen Li, Xuchen Song, Yuan Wan,and Yuxan Wang, “High-resolution piano transcription with pedals by regressing onset and offset times,” IEEE/ACM Transactions on Audio, Speech and Language Processing, vol. 29, pp. 3707–3717, 2021.
[2] Curtis Hawthorne, Andriy Stasyuk, Adam Roberts, Ian Simon, Cheng-Zhi Anna Huang, Sander Dieleman, Erich Elsen, Jesse Engel, and Douglas Eck, “Enabling factorized piano music modeling and generation with the MAESTRO dataset,” in International Conference on Learning Representations, 2019.
[3] Ben Maman and Amit H. Bermano, “Unaligned supervision for automatic music transcription in the wild,” in International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. 2022, vol. 162 of Proceedings of Machine Learning Research, pp. 14918–14934, PMLR.
[4] Keisuke Toyama, Taketo Akama, Yukara Ikemiya, Yuhta Takida, Wei-Hsiang Liao, and Yuki Mitsufuji, “Automatic piano transcription with hierarchical frequency-time transformer,” in Proceedings of the 24th International Society for Music Information Retrieval Conference, Milan, Italy, 2023, pp. 215–222.
Files
Files
(103.8 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:cd449c03d690e97dfe5d7c311ac8fa8c
|
103.8 MB | Download |