Published December 2, 2021 | Version v1
Journal article Open

Imbalanced data as risk factor of discriminating automated decisions: a measurement-based approach

  • 1. Politecnico di Torino

Description

Over the last two decades, the number of organizations -both in the public and private sector- which have automated decisional processes has grown notably. The phenomenon has been enabled by the availability of massive amounts of personal data and the development of software systems that use those data to optimize decisions with respect to certain optimization goals. Today, software systems are involved in a wide realm of decisions that are relevant for the lives of people and the exercise of their rights and freedoms. Illustrative examples are systems that score individuals for their possibility to pay back a debt, recommenders of the best candidates for a job or a house rent advertisement, or tools for automatic moderation of online debates.

While advantages for using algorithmic decision making concern mainly scalability and economic affordability, on the other hand, several critical aspects have emerged, including systematic adverse impact for individuals belonging to minorities and disadvantaged groups. In this context, the terms data and algorithm bias have become familiar to researchers, industry leaders and policy makers, and much ink has been spelled on the concept of algorithm fairness, in order to produce more equitable results and to avoid discrimination. Our approach is different from the main corpus of research on algorithm fairness because we shift the focus from the outcomes of automated decision making systems to its inputs and processes. Instead, we lay the foundations of a risk assessment approach based on a measurable characteristic of input data, i.e. imbalance, which can lead to discriminating automated decisions.  We then relate the imbalance to existing standards and risk assessment procedures.

We believe that the proposed approach can be useful to a variety of stakeholders, e.g. producers and adopters of automated decision making software, policy makers, certification or audit authorities. This would allow for the assessment of the risk level of discriminations when using imbalanced data in decision making software. This assessment should prompt all the involved stakeholders to take appropriate actions to prevent adverse effects.  Such discriminations, in fact, pose a significant obstacle to human rights and freedoms, as our societies increasingly rely on automated decision making. This work is intended to help mitigate this problem, and to contribute to the development of software systems that are socially sustainable and are in line with the shared values of our democratic societies

Notes

Article published in JIPITEC 12 (4) 2021 (ISSN: 2190-3387) . Link to the issue: https://www.jipitec.eu/issues/jipitec-12-4-2021 . Link to the article (Open Access): https://www.jipitec.eu/issues/jipitec-12-4-2021/5452/vetro_pdf.pdf

Files

PUB-2021-jipitec-imbalance.pdf

Files (443.2 kB)

Name Size Download all
md5:43412ae5fb3f3d58d040616118fadf1d
443.2 kB Preview Download

Additional details

Related works

References

  • Vetrò A.(2021), Imbalanced data as risk factor of discriminating automated decisions: a measurement-based approach. JIPITEC 12 (4) 2021