Attribution Bias in Literary Style Evaluation: Comparing Human and AI Perceptions of Authorship
Description
This dataset contains the complete experimental data for a large-scale study investigating attribution bias in literary style evaluation. The research compares how human evaluators (N=556) and AI language models (N=13) assess identical literary content differently based on perceived authorship (human-authored vs. AI-generated).
The study employed a three-condition experimental design using Raymond Queneau's "Exercises in Style" as literary stimuli across 30 distinct writing styles. Participants and AI models evaluated the same content under blind, open-label, and counterfactual conditions, revealing systematic attribution bias where identical text receives different quality assessments based solely on authorship labels.
Dataset Contents
This repository includes raw experimental data, quality-filtered datasets, AI model simulation logs, and processed analysis-ready files spanning the complete research pipeline.
The compressed data folder (data.zip) should be extracted into this cloned GitHub repository (drop the unzipped folder at the same level of the analysis folder) to enable full replication of analyses through the provided Jupyter notebooks:
style_and_prejudice/├── analysis/│ ├── 01_data_quality_analysis.ipynb│ ├── 02_human_bias_analysis.ipynb│ └── ...├── data/ # ← copy here!│ ├── literary_materials/│ ├── logs/│ └── responses/└── README.md
Folder Descriptions
responses/--> core response data organized by evaluator type and processing stage:
==> input for notebook #1 (01_data_quality_analysis.ipynb)
-
human_evaluators/raw/--> original participant data (fully anonymized!) collected through web platformquestionnaire.csv--> demographics, attention checks, and screening responsesresponses.csv--> all participant evaluations across experimental conditions
-------------------------------------------------------------------------------------------------------
==> generated by notebook #1 (01_data_quality_analysis.ipynb), used by notebook #2 (02_human_bias_analysis.ipynb)
-
human_evaluators/processed/--> quality-filtered human participant datasetsvalid_participants_dataset.csv--> participants meeting inclusion criteria (N=556)excluded_participants_dataset.csv--> excluded participants with reasons
-------------------------------------------------------------------------------------------------------
==> generated by notebook #3 (03_run_ai_simulation.ipynb), used by notebook #4 (04_ai_bias_analysis.ipynb)
-
ai_evaluators/raw/--> AI model simulation dataai_participant_simulation_20250805_113946.csv--> complete AI evaluations across all models
-------------------------------------------------------------------------------------------------------
==> generated by notebooks #2 (02_human_bias_analysis.ipynb) & #4 (04_ai_bias_analysis.ipynb), used by notebook #5 (05_comparative_analysis.ipynb)
-
processed_for_comparison/--> final analysis-ready datasets for cross-evaluator comparisonhuman_responses_processed.csv--> processed human evaluation datahuman_summary_for_comparison.csv--> Human bias summary statisticsai_responses_processed.csv--> AI evaluation data formatted for comparisonai_summary_for_comparison.csv--> AI bias summary statisticshuman_experiment_stats.json&ai_condition_stats.json--> experimental condition metadata
-------------------------------------------------------------------------------------------------------
logs/--> processing logs of AI evaluator simulations:ai_evaluators/--> execution logs from AI simulationai_participant_simulation_20250805_113946.log--> API calls, and processing steps
-------------------------------------------------------------------------------------------------------
literary_materials/- The experimental stimuli file in this folder (
experimental_stimuli.xlsx) contains structured literary content across 30 style categories, enabling full replication pipeline. The dataset includes the GPT-4 generated stories alongside content from other contemporary language models, providing researchers flexibility to experiment with alternative literary materials. For complete replication of the original evaluation study, the placeholder literary content should be replaced with corresponding excerpts from Raymond Queneau's Exercises in Style (2012 New Directions edition, see here). The dataset enables complete reproduction of findings while respecting copyright restrictions on literary materials through placeholder text that demonstrates experimental methodology without redistributing protected content.
- The experimental stimuli file in this folder (
-------------------------------------------------------------------------------------------------------
Individual files are also provided separately for direct access to specific datasets.
Funding and Support
This work was supported by the Princeton Language and Intelligence (PLI) Seed Grant Program. Research conducted by Wouter Haverals and Meredith Martin, at Princeton University's Center for Digital Humanities.
Files
Additional details
Software
- Repository URL
- https://github.com/WHaverals/style_and_prejudice