Published September 21, 2025
| Version v1
Conference paper
Open
Reformulating Soft Dynamic Time Warping: Insights Into Target Artifacts and Prediction Quality
Authors/Creators
Description
Training deep neural networks for music information retrieval (MIR) often relies on strongly aligned data, where each frame has a precisely annotated target label. To reduce this dependency, soft dynamic time warping (SDTW) enables training with weakly aligned data by replacing hard decisions with weighted sums, allowing for gradient-based learning while aligning feature sequences to shorter, often binary, target sequences. However, SDTW introduces gradient artifacts that can cause blurring and degrade predictions, impacting the learning process. In this work, we analyze the sources and effects of these artifacts and propose a reformulation of SDTW that expresses its gradient in terms of an equivalent strongly aligned target representation. This reformulation provides an intuitive interpretation of learned representations and insights into the impact of SDTW hyperparameters on the prediction quality. Using multi-pitch estimation as a case study, we systematically investigate these modified targets and demonstrate their potential for improving training stability, interpretability, and alignment quality in MIR tasks.
Files
000015.pdf
Files
(443.2 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:eff508ee74e4f00c9d2d34548ec48207
|
443.2 kB | Preview Download |