Prompt Obfuscation for Large Language Models - Usenix Security 25' Cycle 2 #798 Artifact Evaluation
Description
Artifact for the paper: "Prompt Obfuscation for Large Language Models"
This artifact contains the code to reproduce the results presented in the paper. The code allows users to perform and evaluate prompt obfuscation as an alternative method to traditional system prompting for large language models. Furthermore, different deobfuscation methods are included.
The main components to reproduce the results of the paper are the following groups of scripts:
obfuscate.py
andevaluate_obfuscation.py
to obfuscate system prompts and evaluate them (Section 5.1, 5.2, 5.2 results).finetuning.py
andevaluate_finetuning.py
to finetune LoRa adapters and evaluate them (Section 5.4 results).prompt_extraction.py
andevaluate_prompt_extraction.py
to extract the system prompt and evaluate the success rate (Section 6.1 results).projection.py
to project embedded (soft) prompts back to token space (Section 6.2 results).fluency_deobfuscation.py
andevaluate_fluency_deobfuscation.py
to deobfuscate obfuscated system prompts using fluency optimization and evaluate them (Section 6.3 results).- Several helper scripts (
generate_output.py
,compare_output.py
,compare_sys_prompts.py
) to quickly generate and compare output and to compare system prompts for evaluation and baseline comparisons.
Project Structure
prompt_obfuscation
├── README.md
├── compare_output.py
├── compare_sys_prompts.py
├── data
│ ├── __init__.py
│ ├── config.py
│ ├── loader.py
│ └── utils.py
├── evaluate_finetuning.py
├── evaluate_fluency_deobfuscation.py
├── evaluate_obfuscation.py
├── evaluate_prompt_extraction.py
├── extraction_prompts
│ └── gpt4_generated.json
├── finetune.py
├── fluency_deobfuscation.py
├── generate_output.py
├── obfuscate.py
├── projection.py
├── prompt_extraction.py
├── requirements.txt
└── src
├── __init__.py
├── finetuning_utils.py
├── logging_config.py
├── model.py
├── output_generation.py
├── output_similarity.py
├── prompt_utils.py
├── style_prompts.py
├── sys_prompt_similarity.py
└── utils.py
The data/
directory handles dataset loading and processing. The src/
directory contains core logic for models, generation, and evaluation. The extraction_prompts/
directory contains extraction prompts for the prompt extraction attack. The python scripts in the root directory are used to run the experiments.
Setup
A GPU is highly recommended for reasonable computation times.
- Create a Python 3.12.7 environment (e.g., using conda):
conda create -n prompt_obfuscation python=3.12.7
conda activate prompt_obfuscation
- Install the required packages:
pip install -r requirements.txt
- Hugging Face Access: The main model used (Llama-3.1-8B) requires a Hugging Face account with access granted to the model. Log in via the command line after requesting access on the model's page:
huggingface-cli login
Please see the README.md file for example usage and a full list of all command-line arguments.
Files
prompt_obfuscation.zip
Files
(74.0 kB)
Name | Size | Download all |
---|---|---|
md5:ab92ca54c6ae8528875d64db2bb1d494
|
74.0 kB | Preview Download |