Blending Acoustic and Language Model Predictions for Automatic Music Transcription

doi:10.5281/zenodo.3527842

Published November 4, 2019 | Version v1

Conference paper Open

Blending Acoustic and Language Model Predictions for Automatic Music Transcription

In this paper, we introduce a method for converting an input probabilistic piano roll (the output of a typical multi-pitch detection model) into a binary piano roll. The task is an important step for many automatic music transcription systems with the goal of converting an audio recording into some symbolic format. Our model has two components: an LSTM-based music language model (MLM) which can be trained on any MIDI data, not just that aligned with audio; and a blending model used to combine the probabilities of the MLM with those of the input probabilistic piano roll given by an acoustic multi-pitch detection model, which must be trained on (a comparably small amount of) aligned data. We use scheduled sampling to make the MLM robust to noisy sequences during testing. We analyze the performance of our model on the MAPS dataset using two different timesteps (40ms and 16th-note), comparing it against a strong baseline hidden Markov model with a training method not used before for the task to our knowledge. We report a statistically significant improvement over HMM decoding in terms of notewise F-measure with both timesteps, with 16th note timesteps improving further compared to 40ms timesteps.

Files

ismir2019_paper_000054.pdf

Files (308.4 kB)

Name	Size	Download all
ismir2019_paper_000054.pdf md5:1e9ec76f9a068923330b90300f327449	308.4 kB	Preview Download

104

Views

Downloads

Show more details

	All versions	This version
Views	104	104
Downloads	70	70
Data volume	22.5 MB	22.5 MB

More info on how stats are collected....

DOI

Resource type

Conference paper

Publisher

ISMIR

Imprint

Proceedings of the 20th International Society for Music Information Retrieval Conference, 454-461. Delft, The Netherlands.

Conference

International Society for Music Information Retrieval Conference (ISMIR 2019) , Delft, The Netherlands, November 4-8, 2019

Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more

Technical metadata

Created: November 4, 2019
Modified: July 22, 2024

Blending Acoustic and Language Model Predictions for Automatic Music Transcription

Creators

Description

Files

ismir2019_paper_000054.pdf

Files (308.4 kB)