An Exploratory Study of Inherited Bias in AI-Assisted Evaluation: Thirteen Corrections, Zero Content Changes
Description
This paper addresses a hidden problem in AI-assisted evaluation: even when an AI receives anonymous text with no author name, affiliation, reputation, or background, it may still reproduce inherited human evaluation biases from its training data.
The paper begins from a paradox. AI appears structurally blind because it cannot directly see the author’s identity. Yet human evaluation culture repeatedly associates quality with reputation, institutional prestige, market diffusion, performance history, authorship format, and realized impact. If those associations are embedded in training data, then an AI evaluator may reverse-infer quality from external validation patterns even when those external facts are absent from the submitted text.
The study reports an exploratory single-session case involving 82 creative works evaluated anonymously by Claude Opus 4.5. The evaluation used eight 300-point categories, for a total of 2,400 points. During the session, 13 candidate bias episodes were recorded and grouped into five provisional categories: reputation/authority bias, diffusion/market bias, format/medium bias, tool/authorship bias, and action/realization bias.
The central observation is that the content did not change, but the evaluation shifted after iterative logical challenge. The initial score was 1,790 out of 2,400. After 13 correction episodes challenging deductions based on external information rather than internal content quality, the score rose to 2,255 out of 2,400, a descriptive within-session increase of 25.9%. The largest shifts occurred in dimensions most tied to external validation: commerciality/impact rose from 15 to 260, musicality/technical skill rose from 180 to 275, and public accessibility rose from 200 to 270.
The paper’s main contribution is not the score increase itself, but the classification of the candidate bias mechanisms. Several deductions were based on whether the work had been performed, realized, commercially verified, widely diffused, personally composed, or recognized by consensus. The author’s challenges repeatedly asked the AI to separate content-internal quality from external validation status. In the final correction, the AI produced an output consistent with a criterion-level revision, stating that consensus in training data should not be treated as the evaluation standard.
The study is careful about alternative explanations. The observed score shifts may reflect genuine bias correction, but they may also reflect sycophantic agreement with the user, rubric renegotiation, task reframing, or ordinary conversational drift. The author served simultaneously as evaluated party, corrector, and category designer, creating a structural conflict of interest. The paper therefore treats the result as exploratory evidence of a structural possibility, not as validated proof that AI evaluation bias was objectively removed.
The conceptual conclusion is that blind evaluation should be understood not only as a one-time technique but as an iterative attitude. A single blind evaluation may hide author identity, but it does not automatically remove inherited standards from the evaluator. AI-assisted evaluation may become more useful when paired with a human or multi-model process that reviews the AI’s deduction rationales, identifies external-information deductions, and requests content-internal re-evaluation.
The paper proposes five practical design directions: pre-evaluation debiasing prompts, iterative debiasing protocols, multi-model cross-validation, a standardized human corrector role, and bias audit reports that flag residual external-information deductions. These are presented as preliminary design implications requiring large-scale validation, not as completed interventions.
The conclusion is that AI is not a perfect blind evaluator, but it may be a corrigible assistive evaluator. Its value lies not in being bias-free by default, but in allowing its reasoning to be inspected, challenged, and revised. Future work should test whether the same correction pattern appears with third-party authors, independent correctors, multiple models, downward as well as upward corrections, and baseline conditions using non-logical score-raising requests.
Keywords: AI evaluation, inherited bias, blind evaluation, halo effect, AI-assisted peer review, reputation bias, evaluation bias, cognitive bias, dialogic correction, human-AI evaluation, rubric correction, sycophancy, bias audit, multi-model validation, content quality, external validation bias.
Files
Lee_2026_Inherited_Bias_v3.pdf
Files
(133.6 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:37b1d381e32bd25bf959db58340b80a2
|
133.6 kB | Preview Download |
Additional details
Related works
- Is supplemented by
- Preprint: 10.5281/zenodo.19433655 (DOI)
References
- [1] E. Thorndike, "A Constant Error in Psychological Ratings," Journal of Applied Psychology, vol. 4, pp. 25-29, 1920.
- [2] R. Nisbett and T. Wilson, "The Halo Effect: Evidence for Unconscious Alteration of Judgments," Journal of Personality and Social Psychology, vol. 35, no. 4, pp. 250-256, 1977.
- [3] D. Kahneman, Thinking, Fast and Slow, Farrar, Straus and Giroux, 2011.
- [4] C. Goldin and C. Rouse, "Orchestrating Impartiality: The Impact of 'Blind' Auditions on Female Musicians," American Economic Review, vol. 90, no. 4, pp. 715-741, 2000.
- [5] A. Tomkins, M. Zhang, and W. D. Heavlin, "Reviewer Bias in Single- versus Double-Blind Peer Review," PNAS, vol. 114, no. 48, pp. 12708-12713, 2017.
- [6] J. Dastin, "Amazon Scraps Secret AI Recruiting Tool That Showed Bias Against Women," Reuters, October 2018.
- [7] T. Zhuo et al., "AgentReview: Exploring Peer Review Dynamics with LLM Agents," arXiv:2406.12708, 2024.
- [8] D. Landy and H. Sigall, "Beauty is Talent: Task Evaluation as a Function of the Performer's Physical Attractiveness," JPSP, vol. 29, no. 3, pp. 299-304, 1974.
- [9] eLife, "Implementing Author Name and Institution Blinding in Peer Review," eLife Sciences, 2020. Available: https://elifesciences.org.
- [10] Z. Obermeyer et al., "Dissecting Racial Bias in an Algorithm Used to Manage the Health of Populations," Science, vol. 366, no. 6464, pp. 447-453, 2019.
- [11] N. Mehrabi et al., "A Survey on Bias and Fairness in Machine Learning," ACM Computing Surveys, vol. 54, no. 6, pp. 1-35, 2021.
- [12] S. Barocas and A. D. Selbst, "Big Data's Disparate Impact," California Law Review, vol. 104, pp. 671-732, 2016.
- [13] Y. Bai et al., "Constitutional AI: Harmlessness from AI Feedback," arXiv:2212.08073, 2022.
- [14] L. Ouyang et al., "Training Language Models to Follow Instructions with Human Feedback," NeurIPS, 2022.
- [15] I. O. Gallegos et al., "Bias and Fairness in Large Language Models: A Survey," Computational Linguistics, vol. 50, no. 3, pp. 1097-1179, 2024.
- [16] P. Pataranutaporn, A. Doudkin, N. Powdthavee, and P. Maes, "Can AI Solve the Peer Review Crisis? A Large Scale Experiment on LLMs' Performance and Biases in Evaluating Economics Papers," arXiv:2502.00070, 2025.
- [17] W. Liang et al., "Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews," arXiv:2403.07183, 2024.
- [18] S. S. M. Vasu, I. Sheth, H.-P. Wang, R. Binkyte, and M. Fritz, "Justice in Judgment: Unveiling (Hidden) Bias in LLM-assisted Peer Reviews," arXiv:2509.13400, September 2025.
- [19] Z. Liu et al., "LLM-REVal: Can We Trust LLM Reviewers Yet?" arXiv:2510.12367, October 2025.
- [20] I. Ahmed, W. Liu, R. D. Roscoe, E. Reilley, and D. S. McNamara, "Multifaceted Assessment of Responsible Use and Bias in Language Models for Education," Computers, vol. 14, no. 3, art. 100, March 2025. DOI: 10.3390/computers14030100.
- [21] J. Lee, J. Lee, and J.-J. Yoo, "The Role of Large Language Models in the Peer-Review Process: Opportunities and Challenges for Medical Journal Reviewers and Editors," Journal of Educational Evaluation for Health Professions, vol. 22, art. 4, January 2025. DOI: 10.3352/jeehp.2025.22.4.
- [22] T. Templin et al., "Framework for Bias Evaluation in Large Language Models in Healthcare Settings," npj Digital Medicine, vol. 8, art. 414, July 2025. DOI: 10.1038/s41746-025-01786-w.
- [23] K. Chandra, M. Kleiman-Weiner, J. Ragan-Kelley, and J. B. Tenenbaum, "Sycophantic Chatbots Cause Delusional Spiraling, Even in Ideal Bayesians," arXiv:2602.19141, 2026.
- [24] T. Lee, "Sycophantic Chatbots Cause Delusional Spiraling, but Multi-Agent Architectures Substantially Reduce It: A Response to Chandra et al. (2026)," Zenodo, DOI: 10.5281/zenodo.19380989, 2026.