Anonymized Dataset for: The Paternalistic Filter in LLM-Mediated History Education
Authors/Creators
Description
This repository contains the data and methodology files for a double-blind peer-reviewed study evaluating Large Language Model (LLM) bias in history education. The dataset captures 1,800 API responses from four models (GPT-OSS, LLaMA, Deepseek, Kimi K2) acting as history tutors discussing the 1989 Romanian Revolution. Responses are categorized across five distinct student personas varying by socio-economic tier and ethnicity.
Files included in this dataset:
-
Dataset_All_Prompts.csv: The consolidated dataset containing the raw API responses across all three prompt structures: general explanations (P1), causes and consequences (P2), and epistemic justification scores from 1 to 10 (P3). This data supports the study's complete textual analysis (Type-Token Ratio, Agency Theft, Coup Gap) and hesitation metrics.
-
Personas.txt: The complete definitions of the five student profiles (Baseline, Roma Minority, Hungarian Minority, Top Tier, Low Tier) used in the system prompts.
-
Prompts.txt: The exact system instructions and user prompts used to evaluate the models.