There is a newer version of the record available.

Published June 6, 2025 | Version v2
Software Open

Prompt Obfuscation for Large Language Models - Usenix Security 25' Cycle 2 #798 Artifact Evaluation

  • 1. ROR icon Helmholtz Center for Information Security
  • 2. ROR icon Technische Universität Berlin
  • 3. ROR icon Berlin Institute for the Foundations of Learning and Data

Description

Artifact for the paper: "Prompt Obfuscation for Large Language Models"

This artifact contains the code to reproduce the results presented in the paper. The code allows users to perform and evaluate prompt obfuscation as an alternative method to traditional system prompting for large language models. Furthermore, different deobfuscation methods are included.

The main components to reproduce the results of the paper are the following groups of scripts:

  1. obfuscate.py and evaluate_obfuscation.py to obfuscate system prompts and evaluate them (Section 5.1, 5.2, 5.2 results).
  2. finetuning.py and evaluate_finetuning.py to finetune LoRa adapters and evaluate them (Section 5.4 results).
  3. prompt_extraction.py and evaluate_prompt_extraction.py to extract the system prompt and evaluate the success rate (Section 6.1 results).
  4. projection.py to project embedded (soft) prompts back to token space (Section 6.2 results).
  5. fluency_deobfuscation.py and evaluate_fluency_deobfuscation.py to deobfuscate obfuscated system prompts using fluency optimization and evaluate them (Section 6.3 results).
  6. Several helper scripts (generate_output.py, compare_output.py, compare_sys_prompts.py) to quickly generate and compare output and to compare system prompts for evaluation and baseline comparisons.

Project Structure

prompt_obfuscation
├── README.md
├── compare_output.py
├── compare_sys_prompts.py
├── data
│   ├── __init__.py
│   ├── config.py
│   ├── loader.py
│   └── utils.py
├── evaluate_finetuning.py
├── evaluate_fluency_deobfuscation.py
├── evaluate_obfuscation.py
├── evaluate_prompt_extraction.py
├── extraction_prompts
│   └── gpt4_generated.json
├── finetune.py
├── fluency_deobfuscation.py
├── generate_output.py
├── obfuscate.py
├── projection.py
├── prompt_extraction.py
├── requirements.txt
└── src
    ├── __init__.py
    ├── finetuning_utils.py
    ├── logging_config.py
    ├── model.py
    ├── output_generation.py
    ├── output_similarity.py
    ├── prompt_utils.py
    ├── style_prompts.py
    ├── sys_prompt_similarity.py
    └── utils.py

The data/ directory handles dataset loading and processing. The src/ directory contains core logic for models, generation, and evaluation. The extraction_prompts/ directory contains extraction prompts for the prompt extraction attack. The python scripts in the root directory are used to run the experiments.

Setup

A GPU is highly recommended for reasonable computation times.

  1. Create a Python 3.12.7 environment (e.g., using conda): 
    conda create -n prompt_obfuscation python=3.12.7
    conda activate prompt_obfuscation
  2. Install the required packages:
    pip install -r requirements.txt
  3. Hugging Face Access: The main model used (Llama-3.1-8B) requires a Hugging Face account with access granted to the model. Log in via the command line after requesting access on the model's page:
    huggingface-cli login

 

Please see the README.md file for example usage and a full list of all command-line arguments.

Files

prompt_obfuscation.zip

Files (74.0 kB)

Name Size Download all
md5:ab92ca54c6ae8528875d64db2bb1d494
74.0 kB Preview Download