Benchmarking Structural Validation Metrics for LLM-Generated Directed Graph Artifacts

AId-IQ SARL-S

doi:10.5281/zenodo.18913366

Published March 8, 2026 | Version v1

Preprint Open

Benchmarking Structural Validation Metrics for LLM-Generated Directed Graph Artifacts

AId-IQ SARL-S

This scientific paper presents a rigorous methodological framework for evaluating the structural reproducibility of directed graph artifacts generated by Large Language Models (LLMs). As LLMs are increasingly deployed to generate complex structured outputs like workflows and architectural decompositions, establishing robust validation metrics has become a critical challenge.The study systematically benchmarks seven graph similarity metrics—including Graph Edit Distance, Wasserstein distance, Gromov-Wasserstein, Fused Gromov-Wasserstein, and Unbalanced Fused Gromov-Wasserstein—under controlled synthetic perturbations simulating common generative errors such as semantic drift, abstraction shifts (node splits/merges), and topological hallucinations.Key findings from this research mathematically demonstrate the illusion of a single scalar metric. The empirical results prove that standard hybrid formulations confound benign lexical paraphrasing with severe structural failures, rendering single aggregated scores ambiguous. Furthermore, rigid one-to-one alignment metrics are shown to over-penalize legitimate abstraction shifts, while single-domain metrics suffer from either semantic or structural blindness.To resolve these validation bottlenecks, the paper proposes two scientifically calibrated strategies for automated benchmarking: a decoupled dual-metric diagnostic framework for transparent error profiling, and an engineering-led approach utilizing contextually enriched node embeddings to safely deploy joint optimal transport metrics without confounding the signal.

Files

Benchmarking Structural Validation Metrics for LLM-Generated Directed Graph Artifacts-March-2026.pdf

Files (909.3 kB)

Name	Size	Download all
Benchmarking Structural Validation Metrics for LLM-Generated Directed Graph Artifacts-March-2026.pdf md5:5c27ca778854e2d4fd9eb7f1173dc340	909.3 kB	Preview Download

	All versions	This version
Views	33	33
Downloads	19	19
Data volume	23.6 MB	23.6 MB

Benchmarking Structural Validation Metrics for LLM-Generated Directed Graph Artifacts

Authors/Creators

Description

Files

Benchmarking Structural Validation Metrics for LLM-Generated Directed Graph Artifacts-March-2026.pdf

Files (909.3 kB)