There is a newer version of the record available.

Published September 21, 2025 | Version v1

From Discord to Harmony: Consonance-Based Smoothing for Improved Audio Chord Estimation

Description

Audio Chord Estimation (ACE) holds a pivotal role in music information research, having garnered attention for over two decades due to its relevance for music transcription and analysis. Despite notable advancements, challenges persist in the task, particularly concerning unique characteristics of harmonic content, which have resulted in existing systems' performances reaching a glass ceiling. These challenges include annotator subjectivity, where varying interpretations among annotators lead to inconsistencies, and class imbalance within chord datasets, where certain chord classes are over-represented compared to others, posing difficulties in model training and evaluation. As a first contribution, this paper presents a novel methodology for assessing inter-annotator agreement in chord annotations, using metrics that extend beyond traditional binary measures. Our analysis demonstrates that incorporating the distance metrics based on perceptual concepts of consonance significantly enhances agreement scores. Expanding on these findings, we introduce a novel ACE conformer-based model that integrates consonance concepts into the model through consonance-based label smoothing. The proposed model also addresses class imbalance by separately training models to detect root, bass, and all note activations, enabling the reconstruction of chord labels from this information.

Files

000057.pdf

Files (442.0 kB)

Name Size Download all
md5:5e81c89bb2b4512684f297a313e74d5e
442.0 kB Preview Download