Published September 16, 2025 | Version v4
Software Open

SoK: Robustness in Large Language Models against Jailbreak - IEEE S&P 2026 Cycle 1 #1329 Artifact Evaluation

Description

Artifact for IEEE S&P Submission

This repository contains the artifact for our submission to the IEEE Symposium on Security and Privacy (S&P), titled "SoK: Robustness in Large Language Models against Jailbreak Attacks". It includes the materials, code, and benchmarks used for our evaluation of jailbreak attacks and defenses in Large Language Models (LLMs).

Project Overview 🚀

Large Language Models (LLMs) have demonstrated great success but remain vulnerable to jailbreak attacks that manipulate prompts to induce harmful or policy-violating outputs. These attacks challenge the safety and trust of LLMs in high-stakes applications. Current evaluation frameworks are insufficient, relying mainly on metrics like attack success rate that fail to capture the full complexity of LLM security.

This project introduces Security Cube, a unified framework designed to evaluate and compare attack and defense methods comprehensively. By organizing attacks, defenses, and automated judges into a structured taxonomy, Security Cube provides a multi-dimensional approach for assessing LLM robustness. We benchmark over 13 representative attacks and 5 defenses, offering insights into the current landscape and highlighting key findings, open issues, and future directions to improve LLM robustness, interpretability, and trustworthiness.

Key Features ✨

Security Cube provides a comprehensive framework to evaluate the robustness of LLMs against jailbreak attacks. Unlike traditional evaluation metrics, such as attack success rate (ASR), which offer a limited view, Security Cube takes a multi-dimensional approach, allowing you to assess attack effectiveness, defense robustness, and model security in a more holistic manner.

  • Comprehensive Evaluation: Evaluate LLM security using over 13 representative jailbreak attacks and 5 defenses, covering multiple dimensions of attack and defense characteristics.

  • Benchmarking: Conduct in-depth experiments comparing various attacks and defenses, offering insights into the current landscape of LLM robustness.

  • Scalable and Flexible: Easily integrate new attacks, defenses, and models to extend benchmarking capabilities.

  • Multiple Metrics: Use a range of evaluation metrics to capture a fuller picture of attack and defense performance.

 

Files

Files (481.5 MB)

Name Size Download all
md5:1454cbf20b6186d9dcebea8eb22f5812
481.5 MB Download

Additional details

Dates

Created
2025-09

References

  • @inproceedings{jailbreaksok2026, title={SoK: Robustness in Large Language Models against Jailbreak Attacks}, author={Xu, Feiyue and Hu, Hongsheng and He, Chaoxiang and Hang, Sheng and Hu, Hanqing and Liu, Xiuming and Zhao, Yubo and Zhou, Zhengyan and Zhu, Bin Benjamin and Sun, Shi-Feng and Gu, Dawu and Wang, Shuo}, booktitle={2026 IEEE Symposium on Security and Privacy (SP)}, year={2026}, organization={IEEE}, }