LLM-as-Specification-Judge: Multi-Model Consensus for Trustworthy Cryptographic Verification
Description
Formal verification of cryptographic implementations using proof assistants like F* and Rocq provides strong mathematical guarantees about code correctness. However, the verification process fundamentally depends on human-written specifications that translate informal standards (e.g., NIST FIPS documents, IETF RFCs) into formal machine-checkable predicates. These specifications constitute a critical component of the Trusted Computing Base (TCB), yet remain vulnerable to human error, ambiguity in natural language interpretation, and subtle logical mistakes.
This paper presents Specification Consensus, a novel methodology that employs multiple independent Large Language Models (LLMs) as diverse specification generators, creating an N-version programming paradigm for formal specifications. By generating multiple independent formal specifications from the same authoritative standard and verifying cross-consistency through equivalence proofs, we establish implicit semantic bridges between natural language standards and verified implementations.
Key contributions:
- Identification and characterization of the specification trust problem in formal verification
- A multi-LLM consensus framework for generating and validating formal specifications
- Methodology for specification equivalence verification using proof assistants
- Theoretical analysis of TCB reduction through specification diversity
- Experimental evaluation on SHA-256, AES-128, and ML-KEM cryptographic primitives
Keywords: Formal verification, Trusted Computing Base, Large Language Models, cryptographic specifications, N-version programming, specification synthesis, F*, Rocq, high-assurance cryptography
Files
LLM-as-Spec-Judge.pdf
Files
(195.1 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:423c9cbe03490443027e043d7c4f34c8
|
195.1 kB | Preview Download |