Using Demographic Data as Predictor Variables: a Questionable Choice
- 1. University of Pennsylvania, USA
- 2. Google, USA
- 3. University of Wisconsin, USA
Description
Predictive analytics methods in education are seeing widespread use and are producing increasingly accurate predictions of students’ outcomes. With the increased use of predictive analytics comes increasing concern about fairness for specific subgroups of the population. One approach that has been proposed to increase fairness is using demographic variables directly in models, as predictors. In this paper we explore issues of fairness in the use of demographic variables as predictors of long-term student outcomes, studying the arguments for and against this practice in the contexts where this literature has been published. We analyze arguments for the inclusion of demographic variables, specifically claims that this approach improves model performance and charges that excluding such variables amounts to a form of ‘color-blind’ racism. We also consider arguments against including demographic variables as predictors, including reduced actionability of predictions, risk of reinforcing bias, and limits of categorization. We then discuss how contextual factors of predictive models should influence case-specific decisions for the inclusion or exclusion of demographic variables and discuss the role of proxy variables. We conclude that, on balance, there are greater benefits to fairness if demographic variables are used to validate fairness rather than as predictors within models.
Files
619Baker22To52.pdf
Files
(966.8 kB)
Name | Size | Download all |
---|---|---|
md5:87caf95b9626f11a573213bcaf3d22db
|
966.8 kB | Preview Download |
Additional details
Related works
- Is published in
- Journal article: https://jedm.educationaldatamining.org/index.php/JEDM/article/view/619 (URL)