Attribution Bias in Literary Style Evaluation: Comparing Human and AI Perceptions of Authorship

Haverals, Wouter; Martin, Meredith

doi:10.5281/zenodo.17109458

Published September 12, 2025 | Version 1.0.0

Dataset Restricted

Attribution Bias in Literary Style Evaluation: Comparing Human and AI Perceptions of Authorship

1. Princeton University

This dataset contains the complete experimental data for a large-scale study investigating attribution bias in literary style evaluation. The research compares how human evaluators (N=556) and AI language models (N=13) assess identical literary content differently based on perceived authorship (human-authored vs. AI-generated).

The study employed a three-condition experimental design using Raymond Queneau's "Exercises in Style" as literary stimuli across 30 distinct writing styles. Participants and AI models evaluated the same content under blind, open-label, and counterfactual conditions, revealing systematic attribution bias where identical text receives different quality assessments based solely on authorship labels.

Dataset Contents

This repository includes raw experimental data, quality-filtered datasets, AI model simulation logs, and processed analysis-ready files spanning the complete research pipeline.

The compressed data folder (data.zip) should be extracted into this cloned GitHub repository (drop the unzipped folder at the same level of the analysis folder) to enable full replication of analyses through the provided Jupyter notebooks:

style_and_prejudice/
├── analysis/
│ ├── 01_data_quality_analysis.ipynb
│ ├── 02_human_bias_analysis.ipynb
│ └── ...
├── data/ # ← copy here!
│ ├── literary_materials/
│ ├── logs/
│ └── responses/
└── README.md

Folder Descriptions

responses/ --> core response data organized by evaluator type and processing stage:

==> input for notebook #1 (01_data_quality_analysis.ipynb)

- human_evaluators/raw/ --> original participant data (fully anonymized!) collected through web platform
  - questionnaire.csv --> demographics, attention checks, and screening responses
  - responses.csv --> all participant evaluations across experimental conditions

-------------------------------------------------------------------------------------------------------

==> generated by notebook #1 (01_data_quality_analysis.ipynb), used by notebook #2 (02_human_bias_analysis.ipynb)

- human_evaluators/processed/ --> quality-filtered human participant datasets
  - valid_participants_dataset.csv --> participants meeting inclusion criteria (N=556)
  - excluded_participants_dataset.csv --> excluded participants with reasons

-------------------------------------------------------------------------------------------------------

==> generated by notebook #3 (03_run_ai_simulation.ipynb), used by notebook #4 (04_ai_bias_analysis.ipynb)

- ai_evaluators/raw/ --> AI model simulation data
  - ai_participant_simulation_20250805_113946.csv --> complete AI evaluations across all models

-------------------------------------------------------------------------------------------------------

==> generated by notebooks #2 (02_human_bias_analysis.ipynb) & #4 (04_ai_bias_analysis.ipynb), used by notebook #5 (05_comparative_analysis.ipynb)

- processed_for_comparison/ --> final analysis-ready datasets for cross-evaluator comparison
  - human_responses_processed.csv --> processed human evaluation data
  - human_summary_for_comparison.csv --> Human bias summary statistics
  - ai_responses_processed.csv --> AI evaluation data formatted for comparison
  - ai_summary_for_comparison.csv --> AI bias summary statistics
  - human_experiment_stats.json & ai_condition_stats.json --> experimental condition metadata

-------------------------------------------------------------------------------------------------------

logs/ --> processing logs of AI evaluator simulations:
- ai_evaluators/ --> execution logs from AI simulation
  - ai_participant_simulation_20250805_113946.log --> API calls, and processing steps

-------------------------------------------------------------------------------------------------------

literary_materials/
- The experimental stimuli file in this folder (experimental_stimuli.xlsx) contains structured literary content across 30 style categories, enabling full replication pipeline. The dataset includes the GPT-4 generated stories alongside content from other contemporary language models, providing researchers flexibility to experiment with alternative literary materials. For complete replication of the original evaluation study, the placeholder literary content should be replaced with corresponding excerpts from Raymond Queneau's Exercises in Style (2012 New Directions edition, see here). The dataset enables complete reproduction of findings while respecting copyright restrictions on literary materials through placeholder text that demonstrates experimental methodology without redistributing protected content.

-------------------------------------------------------------------------------------------------------
Individual files are also provided separately for direct access to specific datasets.

Funding and Support

This work was supported by the Princeton Language and Intelligence (PLI) Seed Grant Program. Research conducted by Wouter Haverals and Meredith Martin, at Princeton University's Center for Digital Humanities.

Files

Restricted

The record is publicly accessible, but files are restricted to users with access.

Additional details

Repository URL: https://github.com/WHaverals/style_and_prejudice

	All versions	This version
Views	158	31
Downloads	241	0
Data volume	1.1 GB	0 Bytes

Attribution Bias in Literary Style Evaluation: Comparing Human and AI Perceptions of Authorship

Creators

Description

Dataset Contents

Folder Descriptions

Funding and Support

Files

Restricted

Additional details

Software