Imbalanced data as risk factor of discriminating automated decisions: a measurement-based approach

Antonio Vetrò

doi:10.5281/zenodo.5795184

Published December 2, 2021 | Version v1

Journal article Open

Imbalanced data as risk factor of discriminating automated decisions: a measurement-based approach

Antonio Vetrò¹

1. Politecnico di Torino

Over the last two decades, the number of organizations -both in the public and private sector- which have automated decisional processes has grown notably. The phenomenon has been enabled by the availability of massive amounts of personal data and the development of software systems that use those data to optimize decisions with respect to certain optimization goals. Today, software systems are involved in a wide realm of decisions that are relevant for the lives of people and the exercise of their rights and freedoms. Illustrative examples are systems that score individuals for their possibility to pay back a debt, recommenders of the best candidates for a job or a house rent advertisement, or tools for automatic moderation of online debates.

While advantages for using algorithmic decision making concern mainly scalability and economic affordability, on the other hand, several critical aspects have emerged, including systematic adverse impact for individuals belonging to minorities and disadvantaged groups. In this context, the terms data and algorithm bias have become familiar to researchers, industry leaders and policy makers, and much ink has been spelled on the concept of algorithm fairness, in order to produce more equitable results and to avoid discrimination. Our approach is different from the main corpus of research on algorithm fairness because we shift the focus from the outcomes of automated decision making systems to its inputs and processes. Instead, we lay the foundations of a risk assessment approach based on a measurable characteristic of input data, i.e. imbalance, which can lead to discriminating automated decisions. We then relate the imbalance to existing standards and risk assessment procedures.

We believe that the proposed approach can be useful to a variety of stakeholders, e.g. producers and adopters of automated decision making software, policy makers, certification or audit authorities. This would allow for the assessment of the risk level of discriminations when using imbalanced data in decision making software. This assessment should prompt all the involved stakeholders to take appropriate actions to prevent adverse effects. Such discriminations, in fact, pose a significant obstacle to human rights and freedoms, as our societies increasingly rely on automated decision making. This work is intended to help mitigate this problem, and to contribute to the development of software systems that are socially sustainable and are in line with the shared values of our democratic societies

Notes

Article published in JIPITEC 12 (4) 2021 (ISSN: 2190-3387) . Link to the issue: https://www.jipitec.eu/issues/jipitec-12-4-2021 . Link to the article (Open Access): https://www.jipitec.eu/issues/jipitec-12-4-2021/5452/vetro_pdf.pdf

Files

PUB-2021-jipitec-imbalance.pdf

Files (443.2 kB)

Name	Size	Download all
PUB-2021-jipitec-imbalance.pdf md5:43412ae5fb3f3d58d040616118fadf1d	443.2 kB	Preview Download

Additional details

Is identical to: Journal article: https://www.jipitec.eu/issues/jipitec-12-4-2021/5452/vetro_pdf.pdf (URL)

Vetrò A.(2021), Imbalanced data as risk factor of discriminating automated decisions: a measurement-based approach. JIPITEC 12 (4) 2021

Citations

Oops! Something went wrong while fetching results.

	All versions	This version
Views	101	101
Downloads	86	86
Data volume	40.3 MB	40.3 MB

Imbalanced data as risk factor of discriminating automated decisions: a measurement-based approach

Notes

Files

PUB-2021-jipitec-imbalance.pdf

Files (443.2 kB)

Additional details

Related works

References

Imbalanced data as risk factor of discriminating automated decisions: a measurement-based approach

Creators

Description

Notes

Files

PUB-2021-jipitec-imbalance.pdf

Files (443.2 kB)

Additional details

Related works

References