README — Anonymized Survey Data
"Does Unfairness Hurt Women? The Effects of Losing Unfair Competitions"
Piasenti, Valente, Van Veldhuizen, Pfeifer — Economic Journal


1. OVERVIEW
-----------
This folder contains anonymized data from two pre-registered survey experiments
conducted on Prolific as follow-up studies to the main experiment. The surveys
were designed to elicit fairness perceptions of the tournament procedures used
in the main experiment.

The anonymization script that produced these files (anonymize_survey.R) is
stored in the raw_survey_data folder (not distributed). Sensitive columns
have been removed or set to NA. Participant IDs have been transformed using a
deterministic rule so that the merge key remains consistent across files and
with the main experiment data. All rating variables are untouched.


2. SURVEY DESIGN
----------------

2.1  Survey 1 — Meritocratic vs Unfair
     Conducted on Prolific in May 2024.
     Participants were recruited using the same selection criteria as the main
     experiment (Prolific participants balanced on gender, residing in the US
     or UK). Participants who had already taken part in the main experiment
     were excluded. The study was pre-registered at https://osf.io/4f97v.

     Participants read vignettes describing two tournament procedures:
       Study 1 (Meritocratic): winner is always the higher-performing player.
       Study 2 (Unfair):       winner is the higher-performing player with
                               75% probability and the lower-performing player
                               with 25% probability.
     For each procedure, participants rated three dimensions:
       - Fairness
       - Merit (whether the outcome reflects the player's merit)
       - Random chance (whether the outcome is due to chance)
     Each respondent was randomly assigned to one of three question-block
     orderings (blocks 1.1/1.2/1.3 for Study 1; 2.1/2.2/2.3 for Study 2).
     The replication code uses coalesce across blocks to obtain one value
     per dimension per respondent.
     Median completion time: 3 minutes 40 seconds.
     Participation fee: £1.50.

2.2  Survey 2 — Meritocratic vs Risk-only
     Conducted on Prolific to compare fairness perceptions between the
     Meritocratic Treatment and a purely chance-based (risk-only) task
     in which the winner was entirely determined by a coin flip.
     The study was pre-registered at https://osf.io/4f97v.

     Same three rating dimensions as Survey 1 (fairness, merit, random
     chance).
     Column positions 13–15 contain Study 1 (Meritocratic) ratings;
     columns 18–20 contain Study 2 (Risk-only) ratings. The variables
     Q1, Q2, Q3 are routing indicators that map each column position to
     the corresponding dimension label ("fairness", "merit", "random chance").

     Demographics for Survey 2 participants were not separately collected;
     only the rating data are included in survey2_data_anonymous.xlsx.


3. FILES IN THIS FOLDER
-----------------------

survey1_data_anonymous.xlsx
    Survey 1 ratings.
    Key columns:
      PROLIFIC_PID         — anonymized participant ID (transformed)
      Fairness 1.x_1       — fairness rating, Study 1, block x (x = 1,2,3)
      Merit 1.x_1          — merit rating, Study 1, block x
      Random chance 1.x_1  — random-chance rating, Study 1, block x
      Fairness 2.x_1       — fairness rating, Study 2, block x
      Merit 2.x_1          — merit rating, Study 2, block x
      Random chance 2.x_1  — random-chance rating, Study 2, block x

survey2_data_anonymous.xlsx
    Survey 2 ratings.
    Key columns:
      PROLIFIC_PID  — anonymized participant ID (transformed)
      Columns 13–15 — Meritocratic ratings by position
      Columns 18–20 — Risk-only ratings by position
      Q1, Q2, Q3   — routing labels ("fairness"/"merit"/"random chance")

demographics_survey1_anonymous.csv
    Prolific demographics for Survey 1 participants.
    Used by 03_code_survey.R to merge gender (Sex) and age into the
    analysis dataset for Figures A2 and A3.
    Key columns:
      PROLIFIC_PID  — anonymized participant ID (matches survey1_data_anonymous.xlsx)
      Sex           — "Male" / "Female" (as recorded by Prolific)
      Age           — participant age in years
      (all other Prolific demographic variables are retained as exported)


4. ANONYMIZATION
----------------
The following information has been removed from all three files:

  From Qualtrics XLSX exports (survey1, survey2):
    StartDate, EndDate, IPAddress, RecordedDate, ResponseId,
    RecipientLastName, RecipientFirstName, RecipientEmail,
    ExternalReference, LocationLatitude, LocationLongitude,
    free-text open responses.

  From the Prolific demographics CSV:
    Submission.id, Custom.study.tncs.accepted.at, Started.at,
    Completed.at, Reviewed.at, Archived.at, Completion.code.

Participant IDs (PROLIFIC_PID in xlsx files; Participant.id in the
demographics CSV) have been transformed using a deterministic rule
so that IDs cannot be directly matched to the Prolific platform.
The same transformation is applied consistently across all three files
so the merge key remains valid. The transformation is also consistent
with the ID transformation applied to the main experiment data
(raw_data_anonymous folder), so cross-study participant matching
remains possible within this replication package.

All rating variables and demographic variables not listed above are
provided exactly as exported from Qualtrics and Prolific.


5. OUTLIERS
-----------
Survey 1: N = 670 after removing two invalid observations 
(participants who revoked consent to disclose gender information).
Survey 2: N = 170 after removing two invalid observations.
The replication code (03_code_survey.R) filters them out using their anonymized IDs. 
The exclusion has no material effect on the main survey results.


6. REPRODUCIBILITY
------------------
These files are the direct output of anonymize_survey.R (stored in
raw_survey_data/, not distributed). Re-running that script from the
original raw survey data produces identical output files.

The replication code that uses these files is:
  replication_code/03_code_survey.R
  — reproduces all in-text statistics in Section 4.1 of the paper
  — produces Figures A1–A4 (Appendix A)
