Published April 27, 2025 | Version v2
Dataset Open

Replication Package: Tracking the Moving Target: A Framework for Continuous Evaluation of LLM Test Generation in Industry

  • 1. University of the Basque Country (UPV/EHU) - Faculty of Computer Science
  • 2. LKS Next, S.Coop.

Description

Replication Package: Tracking the Moving Target: A Framework for Continuous Evaluation of LLM Test Generation in Industry

Version: 1.0 (Date: April 27, 2025)
DOI: https://doi.org/10.5281/zenodo.14779767

Paper Information

Title: Tracking the Moving Target: A Framework for Continuous Evaluation of LLM Test Generation in Industry
Authors: Maider Azanza, Beatriz Pérez Lamancha, Eneko Pizarro
Publication: International Conference on Evaluation and Assessment in Software Engineering (EASE), 2025 edition.

Package Overview

This repository contains the replication package for the research paper cited above. It provides the necessary data, source code, and prompts to understand, verify, and potentially extend our findings on the continuous evaluation of LLM-based test generation in an industrial context. The data reflects evaluations conducted between November 2024 and January 2025.

Package Contents

  1. Metrics-Results-by-Function.7z** (Archive, requires 7-Zip or compatible tool to extract)

    • Description: Contains the detailed, raw, and processed metric results for each of the 7 Java methods and classes evaluated in the study.
    • Structure: Inside this archive, you will find 7 individual .zip files, one for each function (e.g., addUser-Metrics-Results.zipassemble-Metrics-Results.zip, ...).
    • Contents (per function zip): Each function-specific zip file typically includes:
      • Raw test cases generated by the evaluated LLMs.
      • Metric measurements (e.g., code coverage reports from SonarQube/JaCoCo).
      • Analysis or intermediate conclusions specific to that function.
      • The specific prompt variations used for that function, if applicable beyond the main prompt.
    • Purpose: Allows for in-depth analysis of LLM performance on specific methods and verification of the metric collection process described in the paper. Data collected between November 2024 and January 2025.
  2. Metric Results by function Nov. 2024 - Jan.2025.pdf (PDF Document)

    • Description: Provides a consolidated tabular view of the key raw metrics collected for each function and LLM evaluated during the November 2024 - January 2025 period.
    • Contents: Tables summarizing metrics like code coverage, number of generated tests, expert assessment scores, etc., broken down by function and LLM. This data is directly derived from the detailed results in Metrics-Results-by-Function.7z.
    • Purpose: Offers a more detailed quantitative overview than the aggregated summary, facilitating direct comparison of raw performance metrics across functions and LLMs without needing to extract all archives.
  3. Aggregated Results by function Nov. 2024 - Jan.2025.pdf (PDF Document)

    • Description: Presents a high-level summary of the evaluation results across all tested methods and LLMs.
    • Contents: Includes an aggregated metric table showing overall performance trends, potentially including the weighted metrics discussed in the paper.
    • Purpose: Provides a quick overview of the main findings and comparative performance of the LLMs according to the evaluation framework.
  4. Prompt_for_Integration_Testing-2025.pdf (PDF Document)

    • Description: The final, refined version of the prompt provided to the LLMs for generating integration test cases.
    • Contents: Details the instructions, context (including source code snippets or descriptions), constraints, and desired output format given to the LLMs. Reflects the prompt-chaining methodology described in the paper.
    • Purpose: Enables understanding of how the LLMs were instructed and allows others to reuse or adapt the prompt engineering approach.
  5. sources.tar.gz (Compressed Tar Archive, requires tar or compatible tool to extract)

    • Description: Contains the original Java source code for the 7 methods that were the targets for test generation.
    • Contents:
      • The specific Java files containing the methods under test.
      • Relevant context or dependency information needed to understand the methods' functionality and complexity.
      • May include documentation (e.g., Javadoc) describing the intended behavior of each method.
    • Purpose: Provides the necessary code context for understanding the test generation task and potentially replicating the test execution or analysis.

Files

Aggregated Results by function Nov. 2024 - Jan.2025.pdf

Files (24.3 MB)

Name Size Download all
md5:c44581a812d8b1c9dc0a2698bfac57d8
68.8 kB Preview Download
md5:94417531fa0ab6677a7925585031602b
66.5 kB Preview Download
md5:7256976c8a9c354d818394bcbae10f0b
11.0 MB Download
md5:d59177d666165b08c5e7304081b4d679
77.7 kB Preview Download
md5:ba9c9555b7e0d26e5736b16d0fd699a9
4.5 kB Preview Download
md5:291e3253646b0522624c89607dad9a76
13.0 MB Download