SoK: Robustness in Large Language Models against Jailbreak - IEEE S&P 2026 Cycle 1 #1329 Artifact Evaluation

XU, FEIYUE; Hu, Hongsheng; Chaoxiang, He; Hang, Sheng; Hu, Hanqing; Liu, Xiuming; Zhao, Yubo; Zhou, Zhengyan; Zhu, Bin; Sun, Shi-Feng; Gu, Dawu; Wang, Shuo

doi:10.5281/zenodo.17163312

Published September 16, 2025 | Version v4

Software Open

SoK: Robustness in Large Language Models against Jailbreak - IEEE S&P 2026 Cycle 1 #1329 Artifact Evaluation

1. Shanghai Jiao Tong University
2. Microsoft Research Asia (China)

Artifact for IEEE S&P Submission

This repository contains the artifact for our submission to the IEEE Symposium on Security and Privacy (S&P), titled "SoK: Robustness in Large Language Models against Jailbreak Attacks". It includes the materials, code, and benchmarks used for our evaluation of jailbreak attacks and defenses in Large Language Models (LLMs).

Project Overview 🚀

Large Language Models (LLMs) have demonstrated great success but remain vulnerable to jailbreak attacks that manipulate prompts to induce harmful or policy-violating outputs. These attacks challenge the safety and trust of LLMs in high-stakes applications. Current evaluation frameworks are insufficient, relying mainly on metrics like attack success rate that fail to capture the full complexity of LLM security.

This project introduces Security Cube, a unified framework designed to evaluate and compare attack and defense methods comprehensively. By organizing attacks, defenses, and automated judges into a structured taxonomy, Security Cube provides a multi-dimensional approach for assessing LLM robustness. We benchmark over 13 representative attacks and 5 defenses, offering insights into the current landscape and highlighting key findings, open issues, and future directions to improve LLM robustness, interpretability, and trustworthiness.

Key Features ✨

Security Cube provides a comprehensive framework to evaluate the robustness of LLMs against jailbreak attacks. Unlike traditional evaluation metrics, such as attack success rate (ASR), which offer a limited view, Security Cube takes a multi-dimensional approach, allowing you to assess attack effectiveness, defense robustness, and model security in a more holistic manner.

Comprehensive Evaluation: Evaluate LLM security using over 13 representative jailbreak attacks and 5 defenses, covering multiple dimensions of attack and defense characteristics.
Benchmarking: Conduct in-depth experiments comparing various attacks and defenses, offering insights into the current landscape of LLM robustness.
Scalable and Flexible: Easily integrate new attacks, defenses, and models to extend benchmarking capabilities.
Multiple Metrics: Use a range of evaluation metrics to capture a fuller picture of attack and defense performance.

Files

Files (481.5 MB)

Name	Size	Download all
code_artifacts.tar md5:1454cbf20b6186d9dcebea8eb22f5812	481.5 MB	Download

Additional details

Created: 2025-09

Repository URL: https://github.com/XOTaichi/Security-Cube-Artifact

@inproceedings{jailbreaksok2026, title={SoK: Robustness in Large Language Models against Jailbreak Attacks}, author={Xu, Feiyue and Hu, Hongsheng and He, Chaoxiang and Hang, Sheng and Hu, Hanqing and Liu, Xiuming and Zhao, Yubo and Zhou, Zhengyan and Zhu, Bin Benjamin and Sun, Shi-Feng and Gu, Dawu and Wang, Shuo}, booktitle={2026 IEEE Symposium on Security and Privacy (SP)}, year={2026}, organization={IEEE}, }

	All versions	This version
Views	1,045	141
Downloads	93	31
Data volume	24.0 GB	16.9 GB

Artifact for IEEE S&P Submission

Project Overview 🚀

Key Features ✨

Files (481.5 MB)

Dates

Software

References

SoK: Robustness in Large Language Models against Jailbreak - IEEE S&P 2026 Cycle 1 #1329 Artifact Evaluation

Authors/Creators

Description

Artifact for IEEE S&P Submission

Project Overview 🚀

Key Features ✨

Files

Files (481.5 MB)

Additional details

Dates

Software

References