Published May 18, 2025 | Version v1
Working paper Open

The Prototype Fairness Illusion: Why Prototype-Based Fair Representations Fail on Image Data

  • 1. ROR icon Indian Institute of Technology Madras

Description

The Prototype Fairness Illusion: Why Prototype-Based Fair Representations Fail on Image Data

This repository contains the research paper studying the behavior of prototype-based fairness methods when applied to high-dimensional visual data.

Abstract

Prototype-based fair representation learning, introduced by Zemel et al. (2013) through Learning Fair Representations (LFR), has demonstrated strong fairness guarantees on tabular data by enforcing statistical parity in prototype assignments. In this work, we systematically study what happens when this framework is extended to high-dimensional image data.

We make three primary contributions:

  1. Geometric analysis of vanilla LFR on images.
    We provide both a theoretical argument and empirical evidence showing that vanilla LFR fails fundamentally on image data. Euclidean distance in pixel space is not semantically meaningful for faces, causing prototype assignments to become nearly uniform. As a result, the fairness constraint (L_z) approaches zero without actually removing sensitive attribute information from the learned representation.

  2. Deep Semantic LFR (DS-LFR).
    We propose an extension that moves prototypes into a learned semantic convolutional latent space and replaces pixel-wise reconstruction loss with a VGG-based perceptual loss. This modification substantially improves classification performance, increasing accuracy from 51.95% to 92.08% on the CelebA benchmark.

  3. Fairness Activation Threshold.
    We identify a previously unreported optimization phenomenon. During early training (epochs 1–14), the model exhibits complete collapse with (L_z = 0) and (L_y = 0.693) (random guessing). At epoch 15, a sharp phase transition occurs where classification and fairness objectives activate simultaneously.

Despite the architectural improvements, sensitive attribute accuracy (sAcc) remains approximately 0.926 across all hyperparameter configurations. This demonstrates that statistical parity in prototype assignments is fundamentally weaker than representation-level disentanglement.

These findings suggest that the (L_z) objective, regardless of weighting, cannot remove protected attribute information from useful visual representations. Instead, adversarial disentanglement mechanisms are likely required to achieve true representation-level fairness in image models.

Dataset: CelebA (Liu et al., 2015)
Code and experiments: https://www.kaggle.com/code/proprak01/prototype-fairness-main-image-data

DOI: https://doi.org/10.5281/zenodo.19016833

Files

LEARNING_FAIR_REPRESENTATIONS.pdf

Files (19.5 MB)

Name Size Download all
md5:210ad9f5b3e05eaab7d56930293a264d
19.5 MB Preview Download

Additional details

Software

References

  • Zemel, R., Wu, Y., Swersky, K., Pitassi, T., & Dwork, C. (2013). Learning Fair Representations. ICML.
  • Liu, Z., Luo, P., Wang, X., & Tang, X. (2015). Deep Learning Face Attributes in the Wild. Proceedings of ICCV.