A Controlled Experiment in Age and Gender Bias When Reading Technical Articles in Software Engineering
- 1. Vanderbilt University
- 2. Google
- 3. University of Michigan
Description
The "Pilot+Survey_Data.zip" file contains data collected from 27 participants and is aimed to determine profile images that are most likely to appear as author images for technical articles. In the enclosed CSV file, the question encoding is organized as follows:
- The first two letters represent the gender and age of the group of profile pictures: YM for younger males, YF for younger females, MM for middle-aged males, MF for middle-aged females, OM for older males, and OM for older females.
- The VX (where X is a [0-9] number) represents the number of the group of profile pictures
- The last digit represents the assigned number of each profile picture in the group (see example stimuli in Figure 2 from the paper)
The "Screening+Survey_Data.zip" file contains data from over 5000 prospective participants and is aimed to verify participants' programming experience. This screening survey is adapted from a survey developed by Danilova, et al. Questions in this survey are the same as the ones suggested by Danilova, et al. For details, please see the original publication:
- Danilova, Anastasia, et al. "Do you really code? designing and evaluating screening questions for online surveys with programmers." 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, 2021.
The "Formal+Survey_Data.zip" file contains data collected from 540 participants and is aimed to evaluate potential biases against technical article authors' age or gender. In the enclosed CSV file, the question number is organized as follows:
- The first digit of each question is used for survey flow purposes to prevent participants from getting the same article text twice (e.g., question 101-XX has the same article text as question 201-XX but with a different author profile picture).
- The subsequent two digits represent the gender and age of the author's profile picture with 1-5 for YM, 6-10 for YF, 11-15 for MM, 15-20 for MF, 21-25 for OM, and 26-30 for OF
- The last two digits represent the question number corresponding to Q1-Answerability, Q2-Content Depth, and Q3-Understandability.
The "Articles_Pool.xlsx" file contains 169 trending Medium.com articles from 11 different categories, including programming, software development, software engineering, technology, artificial intelligence, machine learning, deep learning, python, computer vision, image processing, and object detection.
The "Final_Survey_Design.xlsx" file contains 30 randomly selected articles from the pool of 169 articles, where each article is mapped to 6 different profile images and used as the textual stimuli for the final survey.
The "Survey.pdf" is a Qualtrics-generated PDF for the entire survey. We used "Survey Flow" provided by Qualtrics to randomly assign 6 stimuli to each participant following the study design.
The "clean.py" and "script.r" files are the data cleaning and analysis scripts respectively.
Files
Formal+Survey_Data.zip
Files
(42.4 MB)
Name | Size | Download all |
---|---|---|
md5:7331ec30db2478dbe77223f450936000
|
17.1 kB | Download |
md5:4f4d12653c4596ffd2cf55f7566eb496
|
3.0 kB | Download |
md5:815aaf7bc6ecc33936eed4c482472c31
|
14.9 kB | Download |
md5:dd9111bdd34198907d6db17436aa1877
|
58.5 kB | Preview Download |
md5:0ab36a5f06513d6e0fca051673fc25d8
|
3.7 kB | Preview Download |
md5:16271431a9d17208ccb9a1c0910fbfe1
|
413.4 kB | Preview Download |
md5:234ef74aa4522271b7fe4d0c58ed0e82
|
3.1 kB | Download |
md5:8e4021c06aef47749a01bf88072379b8
|
41.9 MB | Preview Download |