Understanding Performance Limitations in Automatic Drum Transcription

Philipp Weyers; Christian Uhle; Meinard Müller; Matthias Lang

doi:10.5281/zenodo.17706523

There is a newer version of the record available.

Published September 21, 2025 | Version v1

Conference paper Open

Understanding Performance Limitations in Automatic Drum Transcription

Recent advancements in Automatic Drum Transcription (ADT) have improved overall transcription performance. However, state-of-the-art (SOTA) models still struggle with certain drum classes, particularly toms and cymbals, and the specific factors limiting their performance remain unclear. This paper addresses this gap by leveraging the Separate-Tracks-Annotate-Resynthesize Drums (STAR Drums) dataset to create multiple dataset versions that systematically eliminate potential performance constraints. We conduct experiments using three common ADT deep neural network (DNN) architectures to identify and quantify these limitations. For drum transcription in the presence of melodic instruments (DTM), the primary limiting factor is interference from melodic instruments and singing. Aside from this, performance improves by approximately five percent when training and testing use the same single drum kit, only strong onsets are present, or notes are not played simultaneously. For drum transcription of drum-only recordings (DTD), nearly error-free transcription is achieved when simultaneous onsets are removed. This confirms that overlapping drum hits are the main performance constraint. By identifying key ADT challenges, we provide insights to enhance SOTA models and improve overall transcription accuracy.

Files

000067.pdf

Files (123.7 kB)

Name	Size	Download all
000067.pdf md5:58ab723cc9a9e75753857f46b0cdccbd	123.7 kB	Preview Download

159

Views

193

Downloads

Show more details

	All versions	This version
Views	159	100
Downloads	193	171
Data volume	25.0 MB	22.1 MB

More info on how stats are collected....

DOI

Resource type

Conference paper

Publisher

ISMIR

Imprint

Proceedings of the 26th International Society for Music Information Retrieval Conference, 596-602. Daejeon, South Korea.

Conference

International Society for Music Information Retrieval Conference (ISMIR 2025) , Daejeon, South Korea and Online, September 21-25, 2025

License: Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more

Technical metadata

Created: November 25, 2025
Modified: November 25, 2025

Understanding Performance Limitations in Automatic Drum Transcription

Authors/Creators

Description

Files

000067.pdf

Files (123.7 kB)