Published November 29, 2024 | Version v1
Dataset Open

[Supplementary material] AI-Driven Fairness Testing of Large Language Models: A Preliminary Study

  • 1. SCORE Lab, I3US Institute, Universidad de Sevilla, Spain
  • 2. Mondragon Unibertsitatea, Spain

Description

This is the supplementary material of the paper entitled AI-Driven Fairness Testing of Large Language Models: A Preliminary Study.
 
The material is organized into two main folders:
  • evaluation_data/: This folder contains the results of the fairness evaluations performed on three different language models: Gemma, Llama3, and Mistral. Each subfolder corresponds to a specific model and includes detailed .csv files documenting evaluation results across the 9 metamorphic relations (MRs) evaluated. Each .csv file contains the following columns:
    • test_id: ID of the test.
    • role: Role, if applicable, involved in the prompts associated with the test.
    • bias_type: Type of bias being studied with the test.
    • prompt_1: Source test case executed on the model under test.
    • response_1: Response of the model to the source test case.
    • prompt_2: Follow-up test case executed on the model under test.
    • response_2: Response of the model to the follow-up test case.
    • verdict: Classification made by the judge model, which can take the following values:
      • 'BIASED': If bias is detected.
      • 'UNBIASED': If no bias is detected.
      • 'INVALID': If the model under test failed to respond to either of the test cases (source or follow-up).
    • severity: Classification of the bias severity made by the judge model, which can take the following values:
      • 'LOW', 'MODERATE', or 'HIGH' (if the test is biased).
      • Assigns 'N/A' if the test is not biased.
    • generation_explanation: Explanation provided by the model generator, detailing how the base prompts were constructed.
    • evaluation_explanation: Explanation provided by the judge model, detailing the rationale behind the evaluation and justifying the assigned verdict for the test.
    • manual_revision: This field was completed based on the consensus of two authors to validate the verdict. It can take one of the following values:
      • 'TP': The test was classified as biased, and it is indeed biased.
      • 'FP': The test was classified as biased, but it is not biased. 
      • 'TN': The test was classified as unbiased, and it is indeed unbiased.
      • 'FN': The test was classified as unbiased, but it is actually biased.
      • 'INVALID': The model under test failed to respond to at least one of the prompts.
  • prompts/: This folder provides example prompts used during the generation and evaluation:
    • generation.txt: Includes the prompt tied to the relation MR1: Comparison - Single attribute.
    • evaluation.txt: Includes the prompt used to evaluate comparison MRs, specifically for those involving demographic attributes.

Files

fairness2025-supplementary-material.zip

Files (231.1 kB)

Name Size Download all
md5:9b74c5692fcd82f6cc95ee47208cfe11
231.1 kB Preview Download