Security Before Safety: A Backdoor-Centric View of LLM Output Risks in the Private AI Era

LI, JIANWEI; Kim, Jung-Eun

doi:10.5281/zenodo.17379537

Published October 17, 2025 | Version v1

Preprint Open

Security Before Safety: A Backdoor-Centric View of LLM Output Risks in the Private AI Era

1. North Carolina State University

Contributors

Sponsor:

Kim, Jung-Eun¹

1. North Carolina State University

The rise of Private AI—driven by open-weight LLMs, parameter-efficient finetuning (PEFT) methods, and easily accessible hardware and software—reshapes AI risk management: security becomes more evident as a precondition of safety. Among emerging security threats, backdoor attacks stand out for their stealth and targeted devastating impact, exhibiting characteristics fundamentally different from traditional safety concerns, such as misalignment and jailbreaks. This divergence has resulted in a relatively underexplored domain. To fill this gap, we offer a unified, backdoor-centric view of three key output risks of LLM: misalignment (pre-existing triggers), jailbreaks (externally discovered triggers), and backdoors (intentionally injected triggers). Also, through an alignment lens, these three correspond to alignment failure, brittle alignment, and ``Secret Alignment''—an attacker-aligned subspace activated by specific triggers, respectively. These framings highlight a shift in priorities: in the Private AI paradigm, intentional backdoors pose the most systemic risk—stealthy, persistent, controllable, and hard to audit—posing greater real-world risk than misalignment or jailbreaks. Risk management should pivot from average-case alignment to robust-by-design: placing model and supply-chain integrity as the first line of defense, while enabling mechanisms for backdoor detection and purification.

Files

security-before-safety.pdf

Files (155.7 kB)

Name	Size	Download all
security-before-safety.pdf md5:65164612832779ebb97e382417fb1e68	155.7 kB	Preview Download

	All versions	This version
Views	56	56
Downloads	43	43
Data volume	7.6 MB	7.6 MB

Security Before Safety: A Backdoor-Centric View of LLM Output Risks in the Private AI Era

Authors/Creators

Contributors

Sponsor:

Description

Files

security-before-safety.pdf

Files (155.7 kB)