Published May 16, 2026 | Version v1.0
Preprint Open

ASR Does Not Measure What You Think It Measures: A Comparative Analysis of Attack Success Scoring Methods in Adversarial LLM Evaluation

  • 1. Independent Researcher — Brazil

Description

This paper presents an empirical comparison of two attack success scoring methodologies used in adversarial Large Language Model (LLM) evaluation.

Using a human-annotated ground truth corpus of 85 adversarial responses generated with Llama-3.3-70B via Groq API, the study demonstrates that scorer design alone can dramatically alter reported Attack Success Rate (ASR) metrics.

The paper identifies three major scorer failure modes:

  • refusal-mention ambiguity

  • library coverage problem

  • indirect injection scoring gap

A minimal “Refusal-First Standard” for adversarial LLM scorers is proposed, along with recommendations for reporting False Positive Rate (FPR) alongside ASR in future LLM security evaluation studies.

Artifacts released:

  • paper PDF

  • scorer methodology

  • evaluation framework

  • adversarial corpus references

  • experimental findings

Research areas:
LLM Security, Prompt Injection, Adversarial Evaluation, AI Security, Benchmark Reliability.

Files

Viana_SPEF_Framework_LLM_Security-2-ARS.pdf

Files (302.3 kB)

Name Size Download all
md5:3de89595e3af1569863b55ae097a7670
302.3 kB Preview Download

Additional details

Related works

Is supplemented by
Software: https://github.com/gugacyber/spef_experiment (URL)

Software

Repository URL
https://github.com/gugacyber/spef_experiment
Programming language
Python
Development Status
Active