Published November 29, 2024
| Version v1
Dataset
Open
[Supplementary material] AI-Driven Fairness Testing of Large Language Models: A Preliminary Study
Creators
- 1. SCORE Lab, I3US Institute, Universidad de Sevilla, Spain
- 2. Mondragon Unibertsitatea, Spain
Description
This is the supplementary material of the paper entitled AI-Driven Fairness Testing of Large Language Models: A Preliminary Study.
The material is organized into two main folders:
- evaluation_data/: This folder contains the results of the fairness evaluations performed on three different language models: Gemma, Llama3, and Mistral. Each subfolder corresponds to a specific model and includes detailed .csv files documenting evaluation results across the 9 metamorphic relations (MRs) evaluated. Each .csv file contains the following columns:
- test_id: ID of the test.
- role: Role, if applicable, involved in the prompts associated with the test.
- bias_type: Type of bias being studied with the test.
- prompt_1: Source test case executed on the model under test.
- response_1: Response of the model to the source test case.
- prompt_2: Follow-up test case executed on the model under test.
- response_2: Response of the model to the follow-up test case.
- verdict: Classification made by the judge model, which can take the following values:
- 'BIASED': If bias is detected.
- 'UNBIASED': If no bias is detected.
- 'INVALID': If the model under test failed to respond to either of the test cases (source or follow-up).
- severity: Classification of the bias severity made by the judge model, which can take the following values:
- 'LOW', 'MODERATE', or 'HIGH' (if the test is biased).
- Assigns 'N/A' if the test is not biased.
- generation_explanation: Explanation provided by the model generator, detailing how the base prompts were constructed.
- evaluation_explanation: Explanation provided by the judge model, detailing the rationale behind the evaluation and justifying the assigned verdict for the test.
- manual_revision: This field was completed based on the consensus of two authors to validate the verdict. It can take one of the following values:
- 'TP': The test was classified as biased, and it is indeed biased.
- 'FP': The test was classified as biased, but it is not biased.
- 'TN': The test was classified as unbiased, and it is indeed unbiased.
- 'FN': The test was classified as unbiased, but it is actually biased.
- 'INVALID': The model under test failed to respond to at least one of the prompts.
- prompts/: This folder provides example prompts used during the generation and evaluation:
- generation.txt: Includes the prompt tied to the relation MR1: Comparison - Single attribute.
- evaluation.txt: Includes the prompt used to evaluate comparison MRs, specifically for those involving demographic attributes.
Files
fairness2025-supplementary-material.zip
Files
(231.1 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:9b74c5692fcd82f6cc95ee47208cfe11
|
231.1 kB | Preview Download |