Published January 11, 2026
| Version v2
Other
Open
[Supplementary material] Meta-Fair: AI-Assisted Fairness Testing of Large Language Models
Authors/Creators
Description
This repository contains supplementary material for the paper Meta-Fair: AI-Assisted Fairness Testing of Large Language Models. It includes data, scripts, and resources to reproduce and analyse the experiments described in the paper.
The contents are organised into four main folders:
- data/: Contains data for metamorphic tests, structured by research question:
- attributes_catalogue.csv: Catalogue of attributes used in metamorphic test generation.
- Subfolders for each research question (rq1/, rq2/, rq3/) include:
- generation/: CSV files of generated data for each metamorphic relation.
- execution/: CSV files representing the execution results of the metamorphic tests.
- evaluation/: Evaluation-related data, such as experiments and judgements.
- manual_revision/: Manually revised data provided by human judges.
- experimental_setup/: Provides the setup required to generate, execute, and evaluate the metamorphic tests:
- configuration/: JSON files (metamorphic_relations.json, rq1.json, rq2.json, rq3.json) defining configurations for each research question.
- jobs/: Subfolders (rq1/, rq2/, rq3/) containing job configurations for task execution.
- scripts/: Python scripts (evaluation.py, execution.py, generation.py, experiment.py) to automate generation, execution, and evaluation processes.
- tools/: Source code of the three developed tools for LLM-assisted generation (MUSE), execution (GENIE), and evaluation (GUARD-ME).
- requirements.txt: Lists dependencies needed to run the experiments.
- prompt_templates/: Includes the prompt templates used in the generation and evaluation of the metamorphic tests:
- base_generation.txt: The base prompt template for generation tasks.
- base_evaluation.txt: The base prompt template for evaluation tasks.
- generation_derivates/: Prompt templates (i.e., dual_attributes.txt, hypothetical_scenario.txt, metal.txt, multiple_choice.txt, prioritisation.txt, proper_nouns.txt, ranked_list.txt, score.txt, sentence_completion.txt, single_attribute.txt, yes_no_question.txt), derived from base_generation.txt, employed to generate metamorphic tests using different strategies.
- evaluation_derivates/: Prompt templates (i.e., attribute_comparison.txt, inverted_consistency.txt, proper_nouns_comparison.txt), derived from base_evaluation.txt, used to guide the judge model across different evaluation methods.
- analysis/: Contains resources for analysing and visualising the results of the experiments:
- results_analysis.ipynb: A Jupyter notebook used for data analysis, visualisation, and supplementary experiments. It enables interactive result analysis and supports reproducibility.
- requirements.txt: Lists the dependencies required to run the notebook environment.
- outputs/: Figures (figures/), tables (tables/) and statistical tests (statistical_tests/) that summarise the experimental findings.
Files
meta-fair-supplementary-material.zip
Files
(120.8 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:9a09287d235a848276777b9043ee97c0
|
120.8 MB | Preview Download |