Prompt Obfuscation for Large Language Models - Usenix Security 25' Cycle 2 #798 Artifact Evaluation

Pape, David; Mavali, Sina; Eisenhofer, Thorsten; Schönherr, Lea

doi:10.5281/zenodo.15610336

Published June 6, 2025 | Version v2

Software Open

Prompt Obfuscation for Large Language Models - Usenix Security 25' Cycle 2 #798 Artifact Evaluation

1. Helmholtz Center for Information Security
2. Technische Universität Berlin
3. Berlin Institute for the Foundations of Learning and Data

Artifact for the paper: "Prompt Obfuscation for Large Language Models"

This artifact contains the code to reproduce the results presented in the paper. The code allows users to perform and evaluate prompt obfuscation as an alternative method to traditional system prompting for large language models. Furthermore, different deobfuscation methods are included.

The main components to reproduce the results of the paper are the following groups of scripts:

obfuscate.py and evaluate_obfuscation.py to obfuscate system prompts and evaluate them (Section 5.1, 5.2, 5.2 results).
finetuning.py and evaluate_finetuning.py to finetune LoRa adapters and evaluate them (Section 5.4 results).
prompt_extraction.py and evaluate_prompt_extraction.py to extract the system prompt and evaluate the success rate (Section 6.1 results).
projection.py to project embedded (soft) prompts back to token space (Section 6.2 results).
fluency_deobfuscation.py and evaluate_fluency_deobfuscation.py to deobfuscate obfuscated system prompts using fluency optimization and evaluate them (Section 6.3 results).
Several helper scripts (generate_output.py, compare_output.py, compare_sys_prompts.py) to quickly generate and compare output and to compare system prompts for evaluation and baseline comparisons.

Project Structure

prompt_obfuscation
├── README.md
├── compare_output.py
├── compare_sys_prompts.py
├── data
│ ├── __init__.py
│ ├── config.py
│ ├── loader.py
│ └── utils.py
├── evaluate_finetuning.py
├── evaluate_fluency_deobfuscation.py
├── evaluate_obfuscation.py
├── evaluate_prompt_extraction.py
├── extraction_prompts
│ └── gpt4_generated.json
├── finetune.py
├── fluency_deobfuscation.py
├── generate_output.py
├── obfuscate.py
├── projection.py
├── prompt_extraction.py
├── requirements.txt
└── src
├── __init__.py
├── finetuning_utils.py
├── logging_config.py
├── model.py
├── output_generation.py
├── output_similarity.py
├── prompt_utils.py
├── style_prompts.py
├── sys_prompt_similarity.py
└── utils.py

The data/ directory handles dataset loading and processing. The src/ directory contains core logic for models, generation, and evaluation. The extraction_prompts/ directory contains extraction prompts for the prompt extraction attack. The python scripts in the root directory are used to run the experiments.

Setup

A GPU is highly recommended for reasonable computation times.

Create a Python 3.12.7 environment (e.g., using conda):
conda create -n prompt_obfuscation python=3.12.7
conda activate prompt_obfuscation
Install the required packages:
pip install -r requirements.txt
Hugging Face Access: The main model used (Llama-3.1-8B) requires a Hugging Face account with access granted to the model. Log in via the command line after requesting access on the model's page:
huggingface-cli login

Please see the README.md file for example usage and a full list of all command-line arguments.

Files

prompt_obfuscation.zip

Files (74.0 kB)

Name	Size	Download all
prompt_obfuscation.zip md5:ab92ca54c6ae8528875d64db2bb1d494	74.0 kB	Preview Download

	All versions	This version
Views	212	100
Downloads	25	8
Data volume	665.5 MB	592.4 kB

Prompt Obfuscation for Large Language Models - Usenix Security 25' Cycle 2 #798 Artifact Evaluation

Creators

Description

Artifact for the paper: "Prompt Obfuscation for Large Language Models"

Project Structure

Setup

Files

prompt_obfuscation.zip

Files (74.0 kB)