There is a newer version of the record available.

Published March 27, 2023 | Version v3
Journal article Open

A Controlled Experiment in Age and Gender Bias When Reading Technical Articles in Software Engineering

  • 1. Vanderbilt University
  • 2. Google
  • 3. University of Michigan

Description

The "Pilot+Survey_Data.zip" file contains data collected from 27 participants and is aimed to determine profile images that are most likely to appear as author images for technical articles. In the enclosed CSV file, the question encoding is organized as follows:

  • The first two letters represent the gender and age of the group of profile pictures: YM for younger males, YF for younger females, MM for middle-aged males, MF for middle-aged females, OM for older males, and OM for older females.
  • The VX (where X is a [0-9] number) represents the number of the group of profile pictures
  • The last digit represents the assigned number of each profile picture in the group (see example stimuli in Figure 2 from the paper)

The "Screening+Survey_Data.zip" file contains data from over 5000 prospective participants and is aimed to verify participants' programming experience. This screening survey is adapted from a survey developed by Danilova, et al. Questions in this survey are the same as the ones suggested by Danilova, et al. For details, please see the original publication: 

  • Danilova, Anastasia, et al. "Do you really code? designing and evaluating screening questions for online surveys with programmers." 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, 2021.

The "Formal+Survey_Data.zip" file contains data collected from 540 participants and is aimed to evaluate potential biases against technical article authors' age or gender. In the enclosed CSV file, the question number is organized as follows:

  • The first digit of each question is used for survey flow purposes to prevent participants from getting the same article text twice (e.g., question 101-XX has the same article text as question 201-XX but with a different author profile picture).
  • The subsequent two digits represent the gender and age of the author's profile picture with 1-5 for YM, 6-10 for YF, 11-15 for MM, 15-20 for MF, 21-25 for OM, and 26-30 for OF
  • The last two digits represent the question number corresponding to Q1-AnswerabilityQ2-Content Depth, and Q3-Understandability.

The "Articles_Pool.xlsx" file contains 169 trending Medium.com articles from 11 different categories, including programming, software development, software engineering, technology, artificial intelligence, machine learning, deep learning, python, computer vision, image processing, and object detection.

The "Final_Survey_Design.xlsx" file contains 30 randomly selected articles from the pool of 169 articles, where each article is mapped to 6 different profile images and used as the textual stimuli for the final survey.

The "Survey.pdf" is a Qualtrics-generated PDF for the entire survey. We used "Survey Flow" provided by Qualtrics to randomly assign 6 stimuli to each participant following the study design.

The "clean.py" and "script.r" files are the data cleaning and analysis scripts respectively.

Files

Formal+Survey_Data.zip

Files (42.4 MB)

Name Size Download all
md5:7331ec30db2478dbe77223f450936000
17.1 kB Download
md5:4f4d12653c4596ffd2cf55f7566eb496
3.0 kB Download
md5:815aaf7bc6ecc33936eed4c482472c31
14.9 kB Download
md5:dd9111bdd34198907d6db17436aa1877
58.5 kB Preview Download
md5:0ab36a5f06513d6e0fca051673fc25d8
3.7 kB Preview Download
md5:16271431a9d17208ccb9a1c0910fbfe1
413.4 kB Preview Download
md5:234ef74aa4522271b7fe4d0c58ed0e82
3.1 kB Download
md5:8e4021c06aef47749a01bf88072379b8
41.9 MB Preview Download