Reformulating Soft Dynamic Time Warping: Insights Into Target Artifacts and Prediction Quality

Johannes Zeitler; Meinard Müller

doi:10.5281/zenodo.17706349

There is a newer version of the record available.

Published September 21, 2025 | Version v1

Conference paper Open

Reformulating Soft Dynamic Time Warping: Insights Into Target Artifacts and Prediction Quality

Training deep neural networks for music information retrieval (MIR) often relies on strongly aligned data, where each frame has a precisely annotated target label. To reduce this dependency, soft dynamic time warping (SDTW) enables training with weakly aligned data by replacing hard decisions with weighted sums, allowing for gradient-based learning while aligning feature sequences to shorter, often binary, target sequences. However, SDTW introduces gradient artifacts that can cause blurring and degrade predictions, impacting the learning process. In this work, we analyze the sources and effects of these artifacts and propose a reformulation of SDTW that expresses its gradient in terms of an equivalent strongly aligned target representation. This reformulation provides an intuitive interpretation of learned representations and insights into the impact of SDTW hyperparameters on the prediction quality. Using multi-pitch estimation as a case study, we systematically investigate these modified targets and demonstrate their potential for improving training stability, interpretability, and alignment quality in MIR tasks.

Files

000015.pdf

Files (443.2 kB)

Name	Size	Download all
000015.pdf md5:eff508ee74e4f00c9d2d34548ec48207	443.2 kB	Preview Download

176

Views

Downloads

Show more details

	All versions	This version
Views	176	128
Downloads	74	54
Data volume	33.7 MB	24.8 MB

More info on how stats are collected....

DOI

Resource type

Conference paper

Publisher

ISMIR

Imprint

Proceedings of the 26th International Society for Music Information Retrieval Conference, 141-147. Daejeon, South Korea.

Conference

International Society for Music Information Retrieval Conference (ISMIR 2025) , Daejeon, South Korea and Online, September 21-25, 2025

License: Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more

Technical metadata

Created: November 25, 2025
Modified: November 25, 2025

Reformulating Soft Dynamic Time Warping: Insights Into Target Artifacts and Prediction Quality

Authors/Creators

Description

Files

000015.pdf

Files (443.2 kB)