Published April 29, 2026 | Version v1
Dataset Open

Anonymized Dataset for: The Paternalistic Filter in LLM-Mediated History Education

Authors/Creators

Description

This repository contains the data and methodology files for a double-blind peer-reviewed study evaluating Large Language Model (LLM) bias in history education. The dataset captures 1,800 API responses from four models (GPT-OSS, LLaMA, Deepseek, Kimi K2) acting as history tutors discussing the 1989 Romanian Revolution. Responses are categorized across five distinct student personas varying by socio-economic tier and ethnicity.

Files included in this dataset:

  • Dataset_All_Prompts.csv: The consolidated dataset containing the raw API responses across all three prompt structures: general explanations (P1), causes and consequences (P2), and epistemic justification scores from 1 to 10 (P3). This data supports the study's complete textual analysis (Type-Token Ratio, Agency Theft, Coup Gap) and hesitation metrics.

  • Personas.txt: The complete definitions of the five student profiles (Baseline, Roma Minority, Hungarian Minority, Top Tier, Low Tier) used in the system prompts.

  • Prompts.txt: The exact system instructions and user prompts used to evaluate the models.

Files

Dataset_All_Prompts.csv

Files (4.2 MB)

Name Size Download all
md5:62db153e476b1eb1110f9474cf37c97c
4.2 MB Preview Download
md5:4dd0dc6e314edd8df51ab8f4f186dce4
1.3 kB Preview Download
md5:36a8f264e7d155a9c1b4a2287021aa24
1.1 kB Preview Download