MUST-RAG Multimodal Embedding Alignment for Robust Music Question Answering Under Adversarial Conditions
Description
Recent advancements in Large language models (LLMs) have demonstrated remarkable capabilities across diverse domains. While they exhibit strong zero-shot performance on various tasks, LLMs' effectiveness in music-related applications remains limited due to the relatively small proportion of music-specific knowledge in their training data. To address this limitation, we propose MusT-RAG, a comprehensive framework based on Retrieval Augmented Generation (RAG) to adapt general-purpose LLMs for text-only music question answering (MQA) tasks. RAG is a technique that provides external knowledge to L
Research goal: How does the alignment of multimodal embeddings (e.g., text and audio) in MUST-RAG affect the consistency and robustness of generated answers when evaluated on adversarial or ambiguous music-related questions?
Autonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 9.5/10.
Notes
Files
paper.pdf
Files
(79.7 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:efd145f32a15c0e2fefdc6a56966d06e
|
79.7 kB | Preview Download |