GD-Retriever: Controllable Generative Text-Music Retrieval With Diffusion Models

Julien Guinot; Elio Quinton; George Fazekas

doi:10.5281/zenodo.17706387

There is a newer version of the record available.

Published September 21, 2025 | Version v1

Conference paper Open

GD-Retriever: Controllable Generative Text-Music Retrieval With Diffusion Models

Multimodal contrastive models have achieved strong performance in text-audio retrieval and zero-shot settings, but improving joint embedding spaces remains an active research area. Less attention has been given to making these systems controllable and interactive for users. In text-music retrieval, the ambiguity of freeform language creates a many-to-many mapping, often resulting in inflexible or unsatisfying results. We introduce Generative Diffusion Retriever (GDR), a novel framework that leverages diffusion models to generate queries in a retrieval-optimized latent space. This enables controllability through generative tools such as negative prompting and denoising diffusion implicit models (DDIM) inversion, opening a new direction in retrieval control. GDR improves retrieval performance over contrastive teacher models and supports retrieval in audio-only latent spaces using non-jointly trained encoders. Finally, we demonstrate that GDR enables effective post-hoc manipulation of retrieval behavior, enhancing interactive control for text-music retrieval tasks.

Files

000030.pdf

Files (757.7 kB)

Name	Size	Download all
000030.pdf md5:199de41a121a960852fdca3a336d0697	757.7 kB	Preview Download

169

Views

123

Downloads

Show more details

	All versions	This version
Views	169	109
Downloads	123	95
Data volume	100.0 MB	77.3 MB

More info on how stats are collected....

DOI

Resource type

Conference paper

Publisher

ISMIR

Imprint

Proceedings of the 26th International Society for Music Information Retrieval Conference, 276-284. Daejeon, South Korea.

Conference

International Society for Music Information Retrieval Conference (ISMIR 2025) , Daejeon, South Korea and Online, September 21-25, 2025

License: Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more

Technical metadata

Created: November 25, 2025
Modified: November 25, 2025

GD-Retriever: Controllable Generative Text-Music Retrieval With Diffusion Models

Authors/Creators

Description

Files

000030.pdf

Files (757.7 kB)