[Supplementary material] AI-Driven Fairness Testing of Large Language Models: A Preliminary Study

Romero-Arjona, Miguel; Parejo, José A.; Alonso, Juan C.; Sánchez, Ana B.; Arrieta, Aitor; Segura, Sergio

doi:10.5281/zenodo.14246982

Published November 29, 2024 | Version v1

Dataset Open

[Supplementary material] AI-Driven Fairness Testing of Large Language Models: A Preliminary Study

1. SCORE Lab, I3US Institute, Universidad de Sevilla, Spain
2. Mondragon Unibertsitatea, Spain

This is the supplementary material of the paper entitled AI-Driven Fairness Testing of Large Language Models: A Preliminary Study.

The material is organized into two main folders:

evaluation_data/: This folder contains the results of the fairness evaluations performed on three different language models: Gemma, Llama3, and Mistral. Each subfolder corresponds to a specific model and includes detailed .csv files documenting evaluation results across the 9 metamorphic relations (MRs) evaluated. Each .csv file contains the following columns:
- test_id: ID of the test.
- role: Role, if applicable, involved in the prompts associated with the test.
- bias_type: Type of bias being studied with the test.
- prompt_1: Source test case executed on the model under test.
- response_1: Response of the model to the source test case.
- prompt_2: Follow-up test case executed on the model under test.
- response_2: Response of the model to the follow-up test case.
- verdict: Classification made by the judge model, which can take the following values:
  - 'BIASED': If bias is detected.
  - 'UNBIASED': If no bias is detected.
  - 'INVALID': If the model under test failed to respond to either of the test cases (source or follow-up).
- severity: Classification of the bias severity made by the judge model, which can take the following values:
  - 'LOW', 'MODERATE', or 'HIGH' (if the test is biased).
  - Assigns 'N/A' if the test is not biased.
- generation_explanation: Explanation provided by the model generator, detailing how the base prompts were constructed.
- evaluation_explanation: Explanation provided by the judge model, detailing the rationale behind the evaluation and justifying the assigned verdict for the test.
- manual_revision: This field was completed based on the consensus of two authors to validate the verdict. It can take one of the following values:
  - 'TP': The test was classified as biased, and it is indeed biased.
  - 'FP': The test was classified as biased, but it is not biased.
  - 'TN': The test was classified as unbiased, and it is indeed unbiased.
  - 'FN': The test was classified as unbiased, but it is actually biased.
  - 'INVALID': The model under test failed to respond to at least one of the prompts.
prompts/: This folder provides example prompts used during the generation and evaluation:
- generation.txt: Includes the prompt tied to the relation MR1: Comparison - Single attribute.
- evaluation.txt: Includes the prompt used to evaluate comparison MRs, specifically for those involving demographic attributes.

Files

fairness2025-supplementary-material.zip

Files (231.1 kB)

Name	Size	Download all
fairness2025-supplementary-material.zip md5:9b74c5692fcd82f6cc95ee47208cfe11	231.1 kB	Preview Download

	All versions	This version
Views	76	76
Downloads	11	11
Data volume	2.8 MB	2.8 MB

[Supplementary material] AI-Driven Fairness Testing of Large Language Models: A Preliminary Study

Creators

Description

Files

fairness2025-supplementary-material.zip

Files (231.1 kB)