Published February 6, 2025 | Version v1
Conference paper Open

ADIFF: Explaining audio difference using natural language

  • 1. ROR icon Microsoft (United States)
  • 2. ROR icon Carnegie Mellon University

Description

ADIFF is an audio prefix tuning-based language model with a cross-projection module and undergoes a three-step training process. ADIFF takes two audios and text prompt as input and produces different tiers of difference explanations as output. This involves identifying and describing audio events, acoustic scenes, signal characteristics, and their emotional impact on listeners.

The code repository is: soham97/ADIFF

Files

Files (2.4 GB)

Name Size Download all
md5:37ae169cb2706e0a0976aec73aeb5b82
1.2 GB Download
md5:3f61dd7937d43f064a2130bc2fbe5c80
1.2 GB Download

Additional details

Software

Repository URL
https://github.com/soham97/ADIFF
Programming language
Python