Training a Perceptual Model for Evaluating Auditory Similarity in Music Adversarial Attack

Liu, Yuxuan; Sang, Rui; Zhang, Peihong; Li, Zhixin; Li, Shengchen

doi:10.5281/zenodo.17496542

Published November 3, 2025 | Version v1

Conference paper Open

Training a Perceptual Model for Evaluating Auditory Similarity in Music Adversarial Attack

Music Information Retrieval (MIR) systems are highly vulnerable to adversarial attacks that are often imperceptible to humans, primarily due to a misalignment between model feature spaces and human auditory perception. Existing defenses and perceptual metrics frequently fail to adequately capture these auditory nuances, a limitation supported by our initial listening tests showing low correlation between common metrics and human judgments. To bridge this gap, we introduce Perceptually-Aligned MERT Transformer (PAMT), a novel framework for learning robust, perceptually-aligned music representations. Our core innovation lies in the psychoacoustically-conditioned sequential contrastive transformer, a lightweight projection head built atop a frozen MERT encoder. PAMT achieves a Spearman correlation coefficient of 0.65 with subjective scores, outperforming existing perceptual metrics. Our approach also achieves an average of 9.15% improvement in robust accuracy on challenging MIR tasks, including Cover Song Identification and Music Genre Classification, under diverse perceptual adversarial attacks. This work pioneers architecturally-integrated psychoacoustic conditioning, yielding representations significantly more aligned with human perception and robust against music adversarial attacks.

Files

CMMR2025_P2_11.pdf

Files (1.2 MB)

Name	Size	Download all
CMMR2025_P2_11.pdf md5:412839ecea4c9e1754826c60b777b1c1	1.2 MB	Preview Download

Views

Downloads

Show more details

	All versions	This version
Views	4	4
Downloads	7	7
Data volume	9.6 MB	9.6 MB

More info on how stats are collected....

DOI

Resource type

Conference paper

Publisher

Zenodo

Imprint

Proceedings of the 17th International Symposium on Computer Music Multidisciplinary Research, 748-759. London, United Kingdom. ISBN: 979-10-97498-06-1.

Conference

17th International Symposium on Computer Music Multidisciplinary Research (CMMR 2025) , London, United Kingdom, 3-7 November 2025

License: Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more

Technical metadata

Created: November 3, 2025
Modified: November 3, 2025

Training a Perceptual Model for Evaluating Auditory Similarity in Music Adversarial Attack

Creators

Description

Files

CMMR2025_P2_11.pdf

Files (1.2 MB)