SoK: Robustness in Large Language Models against Jailbreak - IEEE S&P 2026 Cycle 1 #1329 Artifact Evaluation
Authors/Creators
Description
Artifact for IEEE S&P Submission
This repository contains the artifact for our submission to the IEEE Symposium on Security and Privacy (S&P), titled "SoK: Robustness in Large Language Models against Jailbreak Attacks". It includes the materials, code, and benchmarks used for our evaluation of jailbreak attacks and defenses in Large Language Models (LLMs).
Project Overview 🚀
Large Language Models (LLMs) have demonstrated great success but remain vulnerable to jailbreak attacks that manipulate prompts to induce harmful or policy-violating outputs. These attacks challenge the safety and trust of LLMs in high-stakes applications. Current evaluation frameworks are insufficient, relying mainly on metrics like attack success rate that fail to capture the full complexity of LLM security.
This project introduces Security Cube, a unified framework designed to evaluate and compare attack and defense methods comprehensively. By organizing attacks, defenses, and automated judges into a structured taxonomy, Security Cube provides a multi-dimensional approach for assessing LLM robustness. We benchmark over 13 representative attacks and 5 defenses, offering insights into the current landscape and highlighting key findings, open issues, and future directions to improve LLM robustness, interpretability, and trustworthiness.
Key Features ✨
Security Cube provides a comprehensive framework to evaluate the robustness of LLMs against jailbreak attacks. Unlike traditional evaluation metrics, such as attack success rate (ASR), which offer a limited view, Security Cube takes a multi-dimensional approach, allowing you to assess attack effectiveness, defense robustness, and model security in a more holistic manner.
-
Comprehensive Evaluation: Evaluate LLM security using over 13 representative jailbreak attacks and 5 defenses, covering multiple dimensions of attack and defense characteristics.
-
Benchmarking: Conduct in-depth experiments comparing various attacks and defenses, offering insights into the current landscape of LLM robustness.
-
Scalable and Flexible: Easily integrate new attacks, defenses, and models to extend benchmarking capabilities.
-
Multiple Metrics: Use a range of evaluation metrics to capture a fuller picture of attack and defense performance.
Files
Files
(481.5 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:1454cbf20b6186d9dcebea8eb22f5812
|
481.5 MB | Download |
Additional details
Dates
- Created
-
2025-09
Software
- Repository URL
- https://github.com/XOTaichi/Security-Cube-Artifact
References
- @inproceedings{jailbreaksok2026, title={SoK: Robustness in Large Language Models against Jailbreak Attacks}, author={Xu, Feiyue and Hu, Hongsheng and He, Chaoxiang and Hang, Sheng and Hu, Hanqing and Liu, Xiuming and Zhao, Yubo and Zhou, Zhengyan and Zhu, Bin Benjamin and Sun, Shi-Feng and Gu, Dawu and Wang, Shuo}, booktitle={2026 IEEE Symposium on Security and Privacy (SP)}, year={2026}, organization={IEEE}, }