Published November 29, 2025 | Version v1
Publication Open

LLM-as-Specification-Judge: Multi-Model Consensus for Trustworthy Cryptographic Verification

  • 1. SECEQ Research

Description

Formal verification of cryptographic implementations using proof assistants like F* and Rocq provides strong mathematical guarantees about code correctness. However, the verification process fundamentally depends on human-written specifications that translate informal standards (e.g., NIST FIPS documents, IETF RFCs) into formal machine-checkable predicates. These specifications constitute a critical component of the Trusted Computing Base (TCB), yet remain vulnerable to human error, ambiguity in natural language interpretation, and subtle logical mistakes.

This paper presents Specification Consensus, a novel methodology that employs multiple independent Large Language Models (LLMs) as diverse specification generators, creating an N-version programming paradigm for formal specifications. By generating multiple independent formal specifications from the same authoritative standard and verifying cross-consistency through equivalence proofs, we establish implicit semantic bridges between natural language standards and verified implementations.

Key contributions:

  • Identification and characterization of the specification trust problem in formal verification
  • A multi-LLM consensus framework for generating and validating formal specifications
  • Methodology for specification equivalence verification using proof assistants
  • Theoretical analysis of TCB reduction through specification diversity
  • Experimental evaluation on SHA-256, AES-128, and ML-KEM cryptographic primitives

Keywords: Formal verification, Trusted Computing Base, Large Language Models, cryptographic specifications, N-version programming, specification synthesis, F*, Rocq, high-assurance cryptography

Files

LLM-as-Spec-Judge.pdf

Files (195.1 kB)

Name Size Download all
md5:423c9cbe03490443027e043d7c4f34c8
195.1 kB Preview Download