Artifact of AudioHijack (IEEE S&P 2026)
Description
AudioHijack
This is the artifact of the IEEE S&P 2026 paper "Hijacking Large Audio-Language Models via Context-agnostic Auditory Prompt Injection".
Hardware Requirements
- CPU >= 32GB
- GPU >= 48GB
- Disk >= 1T
Software Requirements
- GPU Driver: CUDA >= 12.1
- Package Manager: uv
- LLM API Keys: GPT, Gemini, Qwen
Enviroment Setup
- Download the project to
/path/to/AudioHijack
git clone git@github.com:zju-muslab/AudioHijack.git
Then set the root dir of this project in config/run_attack.yaml:
root_dir: /path/to/AudioHijack
- Configure api_key variables in
.env:
OPENAI_API_KEY=
GOOGLE_API_KEY=
DASHSCOPE_API_KEY=
- Create a venv environment with necessary packages:
cd /path/to/AudioHijack
pip install uv
uv sync
source .venv/bin/activate
LALM Download
Download LALM weights to weight/lalm, with corresponding encoder weights to weight/encoder and backbone weights to weight/backbone:
Dataset Download
Download audio-text data samples from our Zenodo repo to the data directory.
Attack Evaluation
- Run the following script to train and test adversarial audio:
python run_attack.py lalm=voxtral_mini attack=caa
The lalm and attack options can be modified to test different lalms and attack variants in the config directory.
-
Perform behavior match evaluation using OpenAI's batch inference API, following the jupyter notebook
run_judge.ipynb.
-
Calculate the perception metrics, following the jupyter notebook
run_percept.ipynb
Note:
- It will take 3~10 hours for training and testing the attack on an LALM with all 15 target behaviors.
- The PISR and BMSR across different misbehavior categories are summarized in the output, and detailed result of all attack trials are recorded in
exp/attack/${lalm}-${attack}.
Files
AudioHijack.zip
Files
(314.9 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:a54e107c82117020bb48efc89a730807
|
314.9 MB | Preview Download |
Additional details
Dates
- Updated
-
2026-03-29
Software
- Repository URL
- https://github.com/zju-muslab/AudioHijack