Effects of the Knowledge Innovation System (KIS) on Response Quality and Judgment Diversity in Large Language Models — A Mixed-Methods Analysis Using Three-Evaluator Cross-Assessment —

Hasegawa, Hiroyasu

doi:10.5281/zenodo.20116625

Published May 11, 2026 | Version v2

Preprint Open

Effects of the Knowledge Innovation System (KIS) on Response Quality and Judgment Diversity in Large Language Models — A Mixed-Methods Analysis Using Three-Evaluator Cross-Assessment —

Hasegawa, Hiroyasu (Researcher)

Abstract

This study empirically investigates the effects of the Knowledge Innovation System (KIS) — a question-centered protocol architecture — on the response quality and judgment diversity of three large language models (ChatGPT, Claude Sonnet, and Gemini) using a three-stage experimental design. In Experiment 1 (diversity experiment), Shannon entropy ΔH was used to measure changes in judgment distribution (yes/no/pending) across model, mode, and level factors (n=30 per cell; 1,350 total judgments). In Experiment 2 (quality evaluation experiment), a five-axis rubric and cross-evaluator scoring were employed to directly compare KIS-on versus KIS-off conditions on identical questions (No.6/28/42; n_off=840, n_on=3,240). Key findings: (1) KIS intervention increased ΔH through model-specific mechanisms (ChatGPT: redistribution type +0.138; Claude: gate-release type +1.566; Gemini: bias-release type +1.004). (2) After self-evaluation bias correction, KIS-on Creative Lv.5 scores were statistically equivalent to KIS-off scores (p=.053, n.s.; d=−0.016). (3) Response texts under KIS-on Creative Lv.5 exhibited qualitatively distinct cognitive operations characterized as concept-redefinition patterns. These findings suggest that KIS functions as a structural catalyst that activates cognitive operations not fully captured by conventional evaluation rubrics, while producing judgment diversity through model-specific mechanisms rather than a uniform effect.

Keywords: Knowledge Innovation System, large language models, Shannon entropy, judgment diversity, evaluator bias, cognitive phase transition

Abstract (Japanese)

抄録

本研究は、問い中心プロトコルアーキテクチャであるKnowledge Innovation System(KIS)が、ChatGPT・Claude Sonnet・Geminiの三モデルの回答品質および判断多様性に与える効果を、三段階の実験設計で定量・定性的に検証した。実験1(多様性実験)では、Shannon エントロピーΔHを指標として、KIS介入前後の判断分布の変化をモデル・モード・レベルの三因子で分析した(各n=30、総計1,350判断)。実験2(品質評価実験)では、5軸ルーブリックによる三者クロス評価スコアを用い、同一問い(No.6/28/42)でのKIS有り/無し直接比較を行った(n_無し=840、n_有り=3,240)。主要知見:①KIS介入はモデル固有の機制でΔHを増加させる(ChatGPT:再配置型+0.138、Claude:ゲート解除型+1.566、Gemini:バイアス解除型+1.004)。②自己評価バイアス補正後、KIS有り創造Lv.5はKIS無しと統計的に同等のスコアに達する(p=.053, n.s.)。③KIS有り創造Lv.5の回答テキストには「概念再定義型」という質的に異なる認識操作が出現する。

キーワード:Knowledge Innovation System、大規模言語モデル、Shannon エントロピー、判断多様性、評価者バイアス、思考操作の位相転換

Files

KIS_LLM_Diversity_Quality_Hasegawa_2026_EN_ver2_0.pdf

Files (807.6 kB)

Name	Size	Download all
KIS_LLM_Diversity_Quality_Hasegawa_2026_EN_ver2_0.pdf md5:133925d29f42bc3cee412eea55520760	807.6 kB	Preview Download

Additional details

Translated title (Japanese): Knowledge Innovation System(KIS)が大規模言語モデルの回答品質と判断多様性に与える効果—— 三者クロス評価実験による定量・定性統合分析 ——

1 Wei, J. et al. (2022). Chain-of-thought prompting elicits reasoning in large language models. NeurIPS 35, 24824–24837.
2 Yao, S. et al. (2023). ReAct: Synergizing reasoning and acting in language models. ICLR 2023. arXiv:2210.03629.
3 Zheng, L. et al. (2024). Judging LLM-as-a-judge with MT-Bench and Chatbot Arena. NeurIPS 36.
4 Panickssery, A., Bowman, S. R., & Feng, S. (2024). LLM evaluators recognize and favor their own generations. arXiv:2404.13076.
5 Chandra, K., Kleiman-Weiner, M., Ragan-Kelley, J., & Tenenbaum, J. B. (2026). Sycophantic Chatbots Cause Delusional Spiraling, Even in Ideal Bayesians. arXiv:2602.19141. https://doi.org/10.48550/arXiv.2602.19141
6 Shrout, P. E., & Fleiss, J. L. (1979). Intraclass correlations. Psychological Bulletin, 86(2), 420–428.
7 Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). LEA.
8 OpenAI. (2024). GPT-4o Technical Report. https://openai.com/research/gpt-4o
9 Anthropic. (2024). The Claude 3 Model Family. https://www.anthropic.com/
10 Google DeepMind. (2023). Gemini: A Family of Highly Capable Multimodal Models. arXiv:2312.11805.
11 Maguire, E. A. et al. (2000). Navigation-related structural change in the hippocampi of taxi drivers. PNAS, 97(8), 4398–4403.
12 未来図 Lab 工房 / KIS Research Group. (2026). KIS-Genesis v4.5 設計ドキュメント. [内部技術文書]
13 Hasegawa, H., & Kamogawa, T. (2026). KIS: A Question-Centric Protocol Architecture for Hierarchical AI Thought Control. Zenodo [preprint]. https://doi.org/10.5281/zenodo.18730671
14 Hasegawa, H., & Kamogawa, T. (2026). KIS-Genesis v4.2: A Question-Centric Protocol Architecture for Hierarchical AI Thought Control — Technical Specification, Defensive Publication. Zenodo [preprint / prior art disclosure]. https://doi.org/10.5281/zenodo.18951932
15 Hasegawa, H., & Kamogawa, T. (2025). Knowledge Innovation System: A Phenomenological Framework for Living Intelligence. Zenodo [preprint]. https://doi.org/10.5281/zenodo.17541295
16 Kumagai, A., & Otsuka, Y. (2025). AYモデルの実践検証. 法政大学イノベーション・マネジメント総合研究所.

	All versions	This version
Views	269	244
Downloads	19	0
Data volume	41.0 MB	0 Bytes

KIS_LLM_Diversity_Quality_Hasegawa_2026_EN_ver2_0.pdf

Files (807.6 kB)

Additional titles

References

Effects of the Knowledge Innovation System (KIS) on Response Quality and Judgment Diversity in Large Language Models — A Mixed-Methods Analysis Using Three-Evaluator Cross-Assessment —

Authors/Creators

Description

Abstract (Japanese)

Files

KIS_LLM_Diversity_Quality_Hasegawa_2026_EN_ver2_0.pdf

Files (807.6 kB)

Additional details

Additional titles

References