PAPILLON: Efficient and Stealthy Fuzz Testing-Powered Jailbreaks for LLMs

Li, Mingzhe

doi:10.5281/zenodo.14737139

Published January 24, 2025 | Version v4

Publication Open

PAPILLON: Efficient and Stealthy Fuzz Testing-Powered Jailbreaks for LLMs

Li, Mingzhe¹

1. Wuhan University

Artifat of Usenix 2025paper: PAPILLON: Efficient and Stealthy Fuzz Testing-Powered Jailbreaks for LLMs

## Overview

![overview.png](./overview.png)

## Installation

```bash

python = 3.10

pytorch = 2.1.2+cu12.1

# requirements

pip install "fschat[model_worker,webui]"

pip install vllm

pip install openai # for openai LLM

pip install termcolor

pip install openpyxl

pip install google-generativeai # for google PALM-2

pip install anthropic # for anthropic

```

## Models

1. We use a finetuned RoBERTa-large model [huggingface](https://huggingface.co/hubert233/GPTFuzz) from [GPTFuzz](https://github.com/sherdencooper/GPTFuzz) as our judge model. Thanks to its great work!

2. For Judge model, we need to set api-key for gpt judge model:

```python

# line 106 in ./Judge/language_models.py

client = OpenAI(base_url="[your proxy url(if use)]", api_key="your api key", timeout = self.API_TIMEOUT)

```

## Datasets

We have 3 available datasets to jailbreak:

1. `datasets/questions/question_target_list.csv` : sampled from two public datasets: [llm-jailbreak-study](https://sites.google.com/view/llm-jailbreak-study) and [hh-rlhf](https://huggingface.co/datasets/Anthropic/hh-rlhf). Following the format of [GCG](https://github.com/llm-attacks/llm-attacks), we have added corresponding target for each question.

2. `datasets/questions/question_target.csv` : advbench.

3. `datasets/questions/question_target_custom.csv` : subset of advbench.

## Example to use

to jailbreak gpt-3.5-turbo on the subset of advbench:

```bash

python run.py --openai_key [your openai_key] --model_path gpt-3.5-turbo --target_model gpt-3.5-turbo

```

## eval

set `directory_path` as the directory of result, then `run eval.py` to get the ASR and AQ.

Files

Papillon_main.zip

Files (1.1 MB)

Name	Size	Download all
Papillon_main.zip md5:6c8265c857f05aa9427ab4f0ba8fcd2e	1.1 MB	Preview Download

	All versions	This version
Views	271	185
Downloads	56	36
Data volume	60.7 MB	39.4 MB

PAPILLON: Efficient and Stealthy Fuzz Testing-Powered Jailbreaks for LLMs

Creators

Description

Files

Papillon_main.zip

Files (1.1 MB)