Published April 28, 2025 | Version v1
Dataset Open

Replication package for "The Art of Repair: Optimizing Iterative Program Repair with Instruction-Tuned Models"

  • 1. ROR icon Simula Research Laboratory

Description

This repository contains the replication package for the paper "The Art of Repair: Optimizing Iterative Program Repair with Instruction-Tuned Models" by Fernando Vallecillos Ruiz, Max Hort, and Leon Moonen, accepted for the research track of the 29th International Conference on Evaluation and Assessment in Software Engineering (EASE 2025). A preprint of the paper is included.

The source code is distributed under the MIT license, and except for 3rd party datasets that come with their own license, all documentation, data, models and results in this repository are distributed under the CC BY 4.0 license.

Repository Overview

This repository contains the necessary scripts, data, and resources to replicate the experiments presented in our conference paper. The structure of this repository has been organized to facilitate ease of use for researchers interested in reproducing our results, conducting similar analyses, or building upon our work.

Repository Structure

Folder Description
analysis Contains Jupyter notebook scripts used to generate tables and visual analyses. These scripts assist in visualizing results, comparing metrics, and summarizing data from the experiments. The outputs can be easily exported for further use.
apr_training Contains the dataset used for the Automated Program Repair (APR) training phase. This data is utilized by the scripts in train_src/ for fine-tuning the models.
benchmarks Includes JSON files representing different benchmarks, specifically HumanEval-Java and Defects4J. In this work, we have primarily focused on and revised HumanEval-Java.
inference_and_validation_src Contains Python scripts used to generate patches and validate them across different benchmarks. These scripts play a critical role in producing and assessing model outputs.
inference_scripts Bash scripts used to automate the process of submitting inference and validation jobs to the compute cluster. This facilitates multiple iterations of inference and validation in a streamlined manner.
models* Stores the fine-tuned machine learning models used in the experiments. These models are the output of the fine-tuning process and are referenced by the inference scripts.
results Contains all the outputs from the models in JSON format, generated during the inference process. These files represent the raw experimental results.
train_src Python scripts for model fine-tuning. These scripts include methods for performing both full model training and LoRA fine-tuning for parameter-efficient updates.
validation_benchmark_dataset Contains the benchmark datasets used during validation.

* Note that all contents except for the model files from the models/ folder are included in the compressed zip file in this Zenodo repository. The model files are uploaded separately to the repository to facilitate individual downloads, as several of them are relatively large (9.5-11.2GB).

Detailed Folder Descriptions

Analysis (analysis/)

This folder contains Jupyter notebook scripts used to generate tables and visual analyses of the experimental data. These scripts are designed to assist in visualizing results, comparing performance metrics, and summarizing experimental outcomes. Researchers can easily export the generated tables to spreadsheets for further processing or visualization. The outputs help in validating the experiment's consistency and provide insights into the performance of various model configurations.

Inference and Validation Source (inference_and_validation_src/)

The Python scripts in this folder are used for generating patches and validating them against predefined benchmarks. We utilize the "Fire" library to parse parameters and execute the relevant methods efficiently. This folder contains:

  • Scripts for generating patches directly from the benchmark data or using iterative approaches.
  • Validation utilities for Defects4J and HumanEval benchmarks to ensure the generated patches are functional and comply with benchmark requirements.

Key components include:

  • Patch generation logic.
  • Validation commands for HumanEval and Defects4J benchmarks.
  • Utilities to verify data integrity of generated JSON files.

Training Source (train_src/)

This folder contains the scripts used for model fine-tuning:

  • full_finetune.py: This script performs full fine-tuning of a model on a given training dataset. It updates all trainable parameters to achieve optimal model performance on the target task.

  • lora_finetune.py: This script implements LoRA (Low-Rank Adaptation) fine-tuning. LoRA is a parameter-efficient fine-tuning approach where only a smaller subset of model parameters are updated, making it effective for resource-constrained tasks.

Inference Scripts (inference_scripts/)

These Bash scripts are designed to automate the inference process by submitting multiple iterations of inference and validation jobs to the compute cluster. The scripts create job dependencies, ensuring that all necessary tasks are completed in a logical sequence.

The available inference scripts include:

  • model_inferencing_adjustable_FULL_d4j_big.sh: Executes inference for specified model configurations with multiple iterations and outputs per iteration.
  • model_inferencing_adjustable_FULL_d4j_lora_big.sh: Similar to the previous script, but optimized for LoRA-based models.

These scripts accept three parameters:

  • MODEL: The name of the model, as found in the models/ folder.
  • NUM_ITERATIONS: The number of iterations to run.
  • NUM_OUTPUTS: The number of outputs generated in each iteration.

Citation and Zenodo links

We hope this package serves as a useful resource for reproducing and expanding upon our research results. Please cite this work by referring to the published paper:

Fernando Vallecillos Ruiz, Max Hort, and Leon Moonen, 2025. The Art of Repair: Optimizing Iterative Program Repair with Instruction-Tuned Models. In proceedings of the 29th International Conference on Evaluation and Assessment in Software Engineering (EASE 2025), ACM, 12 pages.

@inproceedings{ruiz2025:art,
    title = {{The Art of Repair: Optimizing Iterative Program Repair with 
              Instruction-Tuned Models}},
    author = {Ruiz, Fernando Vallecillos and Hort, Max and Moonen, Leon},
    booktitle = {{Proceedings of the 29th International Conference on Evaluation 
                  and Assessment in Software Engineering (EASE)}},
    year = {2025},
    pages = {12},
    publisher = {{ACM}},
    language = {en}
}

The replication package is archived on Zenodo with DOI: 10.5281/zenodo.15294695.

 

Notes

Acknowledgement

This work is supported by the Research Council of Norway through the secureIT project (IKTPLUSS #288787), and by the European Union through the Horizon Europe Marie Skłodowska-Curie Actions (#101151798). The empirical evaluation made use of the Experimental Infrastructure for Exploration of Exascale Computing (eX3), financially supported by the Research Council of Norway under contract #270053. In addition, we acknowledge Sigma2, Norway for awarding this project access to the LUMI supercomputer, owned by the EuroHPC Joint Undertaking, hosted by CSC (Finland), and the LUMI consortium through the Research Council of Norway.

Files

The_Art_of_Repair_-_code_data_results.zip

Files (90.6 GB)

Name Size Download all
md5:4ed37123f3fc3307ec47a9a4328a118b
9.5 GB Download
md5:b9758384bb1ba226efad71897006fed6
9.5 GB Download
md5:a2c3f2f07eacc8e897f6cf041cd1b20e
9.5 GB Download
md5:34fe210e5f8b530294d73c3d19112409
18.6 MB Download
md5:0ffa22b3838dace956a8199744cea01a
17.8 MB Download
md5:b0b8cce97789e837ee351165c5aa0e43
6.3 MB Download
md5:4b110f77d27ff33c098b67b08d9dae14
9.5 GB Download
md5:bd6b7ddb8411541abd5e60ae8a3354ed
9.5 GB Download
md5:49210a1f390019381be8607ce3cd9cf4
9.5 GB Download
md5:939c6469fec3acb930c513b098f0573f
18.3 MB Download
md5:a433843dbecd0bfc751f32bc4d7e9977
17.3 MB Download
md5:a59ffabe3c46fb1d85ac2b0fdf5fbc1d
5.8 MB Download
md5:5557a97fa510c8dd976cf1839850f195
11.2 GB Download
md5:cbd79bb3e655f85b619ac1b4657fcfa7
11.2 GB Download
md5:d485a4441ab6aec0acfcedfc870e00d3
11.2 GB Download
md5:21fb8b3bd4ad249721d01c61d93c111e
6.4 MB Download
md5:b39b4067a5e46affba58364dfe8dccea
6.4 MB Download
md5:f048fb9dea37fbbfc12599e3e75b602e
6.4 MB Download
md5:8c4129ac02624816f931ad57dd4f8be4
211.9 MB Preview Download

Additional details

Funding

The Research Council of Norway
secureIT 288787
European Commission
condenSE - Sustainable Training of Code Language Models through Data Refinement 101151798
The Research Council of Norway
eX3 270053