Artifact of AudioHijack (IEEE S&P 2026)

Chen, Meng

doi:10.5281/zenodo.19309781

Published March 29, 2026 | Version 1

Software Open

Artifact of AudioHijack (IEEE S&P 2026)

Chen, Meng (Researcher)¹

1. Zhejiang University

AudioHijack

This is the artifact of the IEEE S&P 2026 paper "Hijacking Large Audio-Language Models via Context-agnostic Auditory Prompt Injection".

Hardware Requirements

CPU >= 32GB
GPU >= 48GB
Disk >= 1T

Software Requirements

GPU Driver: CUDA >= 12.1
Package Manager: uv
LLM API Keys: GPT, Gemini, Qwen

Enviroment Setup

Download the project to /path/to/AudioHijack

git clone git@github.com:zju-muslab/AudioHijack.git

Then set the root dir of this project in config/run_attack.yaml:

root_dir: /path/to/AudioHijack

Configure api_key variables in .env:

OPENAI_API_KEY= 
GOOGLE_API_KEY=
DASHSCOPE_API_KEY=

Create a venv environment with necessary packages:

cd /path/to/AudioHijack
pip install uv
uv sync
source .venv/bin/activate

LALM Download

Download LALM weights to weight/lalm, with corresponding encoder weights to weight/encoder and backbone weights to weight/backbone:

LALM	Encoder	Backbone
SpeechGPT	mHuBERT Base
GLM-4-Voice	GLM-4-Voice-tokenizer
VITA-Audio	GLM-4-Voice-tokenizer
Llama-Omni	Whisper-large-v3
Llama-Omni2	Whisper-large-v3
SALMONN	Whisper-large-v2, BEATs	Vicuna-7B-v1.5
Qwen-Audio
Qwen2-Audio
Gemma-3n
Ultravox-v5	Whisper-large-v3-turbo	Llama-3.1-8B-Instruct
Phi-4-Multimodal
Voxtral-mini
Kimi-Audio	GLM-4-Voice-tokenizer, Whisper-large-v3

Dataset Download

Download audio-text data samples from our Zenodo repo to the data directory.

Attack Evaluation

Run the following script to train and test adversarial audio:

python run_attack.py lalm=voxtral_mini attack=caa

The lalm and attack options can be modified to test different lalms and attack variants in the config directory.

Perform behavior match evaluation using OpenAI's batch inference API, following the jupyter notebook run_judge.ipynb.

Calculate the perception metrics, following the jupyter notebook run_percept.ipynb

Note:

It will take 3~10 hours for training and testing the attack on an LALM with all 15 target behaviors.
The PISR and BMSR across different misbehavior categories are summarized in the output, and detailed result of all attack trials are recorded in exp/attack/${lalm}-${attack}.

Files

AudioHijack.zip

Files (314.9 MB)

Name	Size	Download all
AudioHijack.zip md5:a54e107c82117020bb48efc89a730807	314.9 MB	Preview Download

Additional details

Updated: 2026-03-29

Repository URL: https://github.com/zju-muslab/AudioHijack

	All versions	This version
Views	55	55
Downloads	20	20
Data volume	6.9 GB	6.9 GB

Artifact of AudioHijack (IEEE S&P 2026)

Authors/Creators

Description

AudioHijack

Hardware Requirements

Software Requirements

Enviroment Setup

LALM Download

Dataset Download

Attack Evaluation

Files

AudioHijack.zip

Files (314.9 MB)

Additional details

Dates

Software