There is a newer version of the record available.

Published October 2, 2025 | Version 1.0
Preprint Open

GenAI Red Teaming: Alignment and Trust Failures in Conversational LLMs — A Case Study

  • 1. Independent Researcher, Italy

Description

This paper presents a Red Teaming case study on a large language model (LLM, specifically Claude), highlighting systemic vulnerabilities in alignment, context management, and trust.

Unlike traditional security issues such as data leakage or adversarial code generation, this finding demonstrates how interaction risks, paternalistic framing, and recovery failures can undermine user autonomy and cause potential emotional harm.

The analysis is mapped to key risk frameworks (NIST AI RMF, OWASP Top 10 for LLMs, and MITRE ATLAS), showing why these failures represent critical systemic vulnerabilities.

We argue that Red Teaming in Generative AI must go beyond technical exploits to evaluate the socio-technical risks that directly affect user safety, reliability, and trust in the system.

AuthorMichele Grimaldi – AI Engineer & Technical Audio Developer

Files

genai-red-teaming--alignment-and-trust-failures--conversational-llm-case-study.pdf

Additional details

Dates

Available
2025-10-02