Replication Package: Tracking the Moving Target: A Framework for Continuous Evaluation of LLM Test Generation in Industry
- 1. University of the Basque Country (UPV/EHU) - Faculty of Computer Science
- 2. LKS Next, S.Coop.
Description
Replication Package: Tracking the Moving Target: A Framework for Continuous Evaluation of LLM Test Generation in Industry
Version: 1.0 (Date: April 27, 2025)
DOI: https://doi.org/10.5281/zenodo.14779767
Paper Information
Title: Tracking the Moving Target: A Framework for Continuous Evaluation of LLM Test Generation in Industry
Authors: Maider Azanza, Beatriz Pérez Lamancha, Eneko Pizarro
Publication: International Conference on Evaluation and Assessment in Software Engineering (EASE), 2025 edition.
Package Overview
This repository contains the replication package for the research paper cited above. It provides the necessary data, source code, and prompts to understand, verify, and potentially extend our findings on the continuous evaluation of LLM-based test generation in an industrial context. The data reflects evaluations conducted between November 2024 and January 2025.
Package Contents
-
Metrics-Results-by-Function.7z** (Archive, requires 7-Zip or compatible tool to extract)- Description: Contains the detailed, raw, and processed metric results for each of the 7 Java methods and classes evaluated in the study.
- Structure: Inside this archive, you will find 7 individual
.zipfiles, one for each function (e.g.,addUser-Metrics-Results.zip,assemble-Metrics-Results.zip, ...). - Contents (per function zip): Each function-specific zip file typically includes:
- Raw test cases generated by the evaluated LLMs.
- Metric measurements (e.g., code coverage reports from SonarQube/JaCoCo).
- Analysis or intermediate conclusions specific to that function.
- The specific prompt variations used for that function, if applicable beyond the main prompt.
- Purpose: Allows for in-depth analysis of LLM performance on specific methods and verification of the metric collection process described in the paper. Data collected between November 2024 and January 2025.
-
Metric Results by function Nov. 2024 - Jan.2025.pdf(PDF Document)- Description: Provides a consolidated tabular view of the key raw metrics collected for each function and LLM evaluated during the November 2024 - January 2025 period.
- Contents: Tables summarizing metrics like code coverage, number of generated tests, expert assessment scores, etc., broken down by function and LLM. This data is directly derived from the detailed results in
Metrics-Results-by-Function.7z. - Purpose: Offers a more detailed quantitative overview than the aggregated summary, facilitating direct comparison of raw performance metrics across functions and LLMs without needing to extract all archives.
-
Aggregated Results by function Nov. 2024 - Jan.2025.pdf(PDF Document)- Description: Presents a high-level summary of the evaluation results across all tested methods and LLMs.
- Contents: Includes an aggregated metric table showing overall performance trends, potentially including the weighted metrics discussed in the paper.
- Purpose: Provides a quick overview of the main findings and comparative performance of the LLMs according to the evaluation framework.
-
Prompt_for_Integration_Testing-2025.pdf(PDF Document)- Description: The final, refined version of the prompt provided to the LLMs for generating integration test cases.
- Contents: Details the instructions, context (including source code snippets or descriptions), constraints, and desired output format given to the LLMs. Reflects the prompt-chaining methodology described in the paper.
- Purpose: Enables understanding of how the LLMs were instructed and allows others to reuse or adapt the prompt engineering approach.
-
sources.tar.gz(Compressed Tar Archive, requires tar or compatible tool to extract)- Description: Contains the original Java source code for the 7 methods that were the targets for test generation.
- Contents:
- The specific Java files containing the methods under test.
- Relevant context or dependency information needed to understand the methods' functionality and complexity.
- May include documentation (e.g., Javadoc) describing the intended behavior of each method.
- Purpose: Provides the necessary code context for understanding the test generation task and potentially replicating the test execution or analysis.
Files
Aggregated Results by function Nov. 2024 - Jan.2025.pdf
Files
(24.3 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:c44581a812d8b1c9dc0a2698bfac57d8
|
68.8 kB | Preview Download |
|
md5:94417531fa0ab6677a7925585031602b
|
66.5 kB | Preview Download |
|
md5:7256976c8a9c354d818394bcbae10f0b
|
11.0 MB | Download |
|
md5:d59177d666165b08c5e7304081b4d679
|
77.7 kB | Preview Download |
|
md5:ba9c9555b7e0d26e5736b16d0fd699a9
|
4.5 kB | Preview Download |
|
md5:291e3253646b0522624c89607dad9a76
|
13.0 MB | Download |