Externalising Epistemic Governance for Stateless Large Language Models: The CUL/TCL Architecture

Natangelo, Stefano

doi:10.5281/zenodo.17953956

Published December 16, 2025 | Version v1

Preprint Open

Externalising Epistemic Governance for Stateless Large Language Models: The CUL/TCL Architecture

Natangelo, Stefano^{1, 2}

1. University of Milan
2. Fondazione IRCCS Istituto Nazionale dei Tumori

Contributors

Researcher:

Natangelo, Stefano^{1, 2}

1. University of Milan
2. Fondazione IRCCS Istituto Nazionale dei Tumori

This working paper proposes a layered architectural framework for controlling, verifying, and contextualizing large language model outputs in high-stakes domains. Rather than introducing a new model or benchmark, the paper focuses on structural design principles for aligning model behavior with domain-specific constraints, verification requirements, and accountability mechanisms.

The work is intended as a conceptual and governance-oriented contribution, suitable for discussion in AI governance, evaluation, and safety contexts.

Files

manuscript.pdf

Files (744.2 kB)

Name	Size	Download all
manuscript.pdf md5:96729ec0bb8cecd23673725947b39d47	744.2 kB	Preview Download

Additional details

Steyvers M, Tejeda H, Kumar A, et al. What large language models know and what people think they know. Nature Machine Intelligence 2025 7:2 2025;7:221–31. https://doi.org/10.1038/s42256-024-00976-7
Lin Z, Tao J, Yuan Y, et al. Existing LLMs Are Not Self-Consistent For Simple Tasks 2025. https://doi.org/10.48550/arXiv.2506.18781
Zamani H, Bendersky M. Stochastic RAG: End-to-End Retrieval-Augmented Generation through Expected Utility Maximization. SIGIR 2024 - Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval 2024;1:2641–6. https://doi.org/10.1145/3626772.3657923
Oche AJ, Folashade AG, Ghosal T, et al. A Systematic Review of Key Retrieval-Augmented Generation (RAG) Systems: Progress, Gaps, and Future Directions 2025. https://doi.org/10.48550/arXiv.2507.18910
Chen ZZ, Ma J, Zhang X, et al. A Survey on Large Language Models for Critical Societal Domains: Finance, Healthcare, and Law. Transactions on Machine Learning Research 2024;2024. https://doi.org/10.48550/arXiv.2405.01769
Sapkota R, Roumeliotis KI, Karkee M. AI Agents vs. Agentic AI: A Conceptual taxonomy, applications and challenges. Information Fusion 2026;126:103599. https://doi.org/10.1016/J.INFFUS.2025.103599
Natangelo S. The Narrative Continuity Test: A Conceptual Framework for Evaluating Identity Persistence in AI Systems 2025. https://doi.org/10.48550/arXiv.2510.24831
Wang X, Wei J, Schuurmans D, et al. Self-Consistency Improves Chain of Thought Reasoning in Language Models. 11th International Conference on Learning Representations, ICLR 2023 2022. https://doi.org/10.48550/arXiv.2203.11171
Dhuliawala S, Komeili M, Xu J, et al. Chain-of-Verification Reduces Hallucination in Large Language Models. Proceedings of the Annual Meeting of the Association for Computational Linguistics 2024:3563–78. https://doi.org/10.18653/V1/2024.FINDINGS-ACL.212
Kale S, Nadadur V, Ai K. Line of Duty: Evaluating LLM Self-Knowledge via Consistency in Feasibility Boundaries 2025:127–40. https://doi.org/10.18653/V1/2025.TRUSTNLP-MAIN.10
Li H, Dong Q, Chen J, et al. LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods 2024;1. https://doi.org/10.48550/arXiv.2412.05579
Findeis A, Weers F, Yin G, et al. Can External Validation Tools Improve Annotation Quality for LLM-as-a-Judge? 2025;1:15997–6020. https://doi.org/10.18653/V1/2025.ACL-LONG.779
Hamman F, Zhu C, Kumar A, et al. Improving Consistency in Retrieval-Augmented Systems with Group Similarity Rewards 2025. https://doi.org/10.48550/arXiv.2510.04392
Ahn JJ, Yin W. Prompt-Reverse Inconsistency: LLM Self-Inconsistency Beyond Generative Randomness and Prompt Paraphrasing 2025. https://doi.org/10.48550/arXiv.2504.01282
Kelly M. The Epistemic Suite: A Post-Foundational Diagnostic Methodology for Assessing AI Knowledge Claims 2025. https://doi.org/10.48550/arXiv.2510.24721
Lin H, Deng Y, Gu Y, et al. FACT-AUDIT: An Adaptive Multi-Agent Framework for Dynamic Fact-Checking Evaluation of Large Language Models 2025;1:360–81. https://doi.org/10.18653/V1/2025.ACL-LONG.17
Xiong C, Zheng G, Ma X, et al. DelphiAgent: A trustworthy multi-agent verification framework for automated fact verification. Inf Process Manag 2025;62:104241. https://doi.org/10.1016/J.IPM.2025.104241
Lee C, Porfirio D, Wang XJ, et al. VeriPlan: Integrating Formal Verification and LLMs into End-User Planning. Conference on Human Factors in Computing Systems - Proceedings 2025;1. https://doi.org/10.1145/3706598.3714113
Liang T, He Z, Jiao W, et al. Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate. EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference 2024:17889–904. https://doi.org/10.18653/V1/2024.EMNLP-MAIN.992
He H, Li Y, Wen D, et al. Debating Truth: Debate-driven Claim Verification with Multiple Large Language Model Agents. Proceedings of Proceedings of The Web Conference 2026 (TheWebConf 2026) 2025;1. https://doi.org/10.48550/arXiv.2507.19090
Bai Y, Kadavath S, Kundu S, et al. Constitutional AI: Harmlessness from AI Feedback 2022. https://doi.org/10.48550/arXiv.2212.08073
Gao L, Dai Z, Pasupat P, et al. RARR: Researching and Revising What Language Models Say, Using Language Models. Proceedings of the Annual Meeting of the Association for Computational Linguistics 2022;1:16477–508. https://doi.org/10.18653/v1/2023.acl-long.910
Min S, Krishna K, Lyu X, et al. FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation. EMNLP 2023 - 2023 Conference on Empirical Methods in Natural Language Processing, Proceedings 2023:12076–100. https://doi.org/10.18653/v1/2023.emnlp-main.741
Jiang Z, Xu FF, Gao L, et al. Active Retrieval Augmented Generation. EMNLP 2023 - 2023 Conference on Empirical Methods in Natural Language Processing, Proceedings 2023:7969–92. https://doi.org/10.18653/v1/2023.emnlp-main.495
OpenAI, Achiam J, Adler S, et al. GPT-4 Technical Report 2023. https://doi.org/10.48550/arXiv.2303.08774
OpenAI. GPT-4 System Card https://cdn.openai.com/papers/gpt-4-system-card.pdf (accessed December 6, 2025).
Romera-Paredes B, Barekatain M, Novikov A, et al. Mathematical discoveries from program search with large language models. Nature 2023 625:7995 2023;625:468–75. https://doi.org/10.1038/s41586-023-06924-6
Evans O, Cotton-Barratt O, Finnveden L, et al. Truthful AI: Developing and governing AI that does not lie 2021. https://doi.org/10.48550/arXiv.2110.06674
Wright CS. Beyond Prediction -- Structuring Epistemic Integrity in Artificial Reasoning Systems 2025. https://doi.org/10.48550/arXiv.2506.17331
Guyatt GH, Oxman AD, Vist GE, et al. GRADE: an emerging consensus on rating quality of evidence and strength of recommendations. BMJ 2008;336:924–6. https://doi.org/10.1136/BMJ.39489.470347.AD
OCEBM Levels of Evidence — Centre for Evidence-Based Medicine (CEBM), University of Oxford n.d. https://www.cebm.ox.ac.uk/resources/levels-of-evidence/ocebm-levels-of-evidence (accessed December 6, 2025).
Moffatt v. Air Canada, 2024 BCCRT 149 (CanLII) 2024
Hu EJ, Shen Y, Wallis P, et al. LoRA: Low-Rank Adaptation of Large Language Models. ICLR 2022 - 10th International Conference on Learning Representations 2021
Houlsby N, Giurgiu A, Jastrze¸bski SJ, et al. Parameter-Efficient Transfer Learning for NLP, PMLR; 2019, p. 2790–9
Meng K, Bau D, Andonian A, et al. Locating and Editing Factual Associations in GPT. Adv Neural Inf Process Syst 2022;35. https://doi.org/10.48550/arXiv.2202.05262
Meng K, Sharma A Sen, Andonian A, et al. Mass-Editing Memory in a Transformer. 11th International Conference on Learning Representations, ICLR 2023 2022. https://doi.org/10.48550/arXiv.2210.07229
Laban P, Schnabel T, Bennett PN, et al. SUMMAC: Re-Visiting NLI-based Models for Inconsistency Detection in Summarization. Trans Assoc Comput Linguist 2022;10:163–77. https://doi.org/10.1162/TACL_A_00453/109470
Ouyang L, Wu J, Jiang X, et al. Training language models to follow instructions with human feedback. Adv Neural Inf Process Syst 2022;35. https://doi.org/10.48550/arXiv.2203.02155
Gao Y, Xiong Y, Gao X, et al. Retrieval-Augmented Generation for Large Language Models: A Survey. Proceedings - 2024 Conference on AI, Science, Engineering, and Technology, AIxSET 2024 2023:166–9. https://doi.org/10.1109/AIxSET62544.2024.00030

	All versions	This version
Views	141	136
Downloads	39	37
Data volume	35.7 MB	34.2 MB

Externalising Epistemic Governance for Stateless Large Language Models: The CUL/TCL Architecture

Authors/Creators

Contributors

Researcher:

Description

Files

manuscript.pdf

Files (744.2 kB)

Additional details

References