Artifact of LLM-Guided Genetic Improvement: Envisioning Semantic Aware Automated Software Evolution

Even Mendoza, Karine; Brownlee, Alexander; Geiger, Alina; Hanna, Carol; Petke, Justyna; Sarro, Federica; Sobania, Dominik

doi:10.5281/zenodo.15834984

Published July 8, 2025 | Version ASE 2025 V1

Dataset Open

Artifact of LLM-Guided Genetic Improvement: Envisioning Semantic Aware Automated Software Evolution

1. King's College London
2. University of Stirling
3. University College London
4. Johannes Gutenberg University

Abstract:

we propose the investigation of a new research line on AI-powered GI aimed at incorporating semantic aware search. We take a first step at it by augmenting GI with the use of automated clustering of LLM edits. We provide initial empirical evidence that our proposal, dubbed PatchCat, allows us to automatically and effectively categorize LLM-suggested patches. PatchCat identified 18 different types of software patches and categorized newly suggested patches with high accuracy. It also enabled detecting NoOp edits in advance and, prospectively, to skip test suite execution to save resources in many cases. These results, coupled with the fact that PatchCat works with small, local LLMs, are a promising step toward interpretable, efficient, and green GI.

Note: This is the artifact of ASE NIER 2025 publication.

Examples for RQ2 (full example text):

Reasons for inconsistency in tagging:

Case of Different prioritization by the model and humans:
The model prioritizes importance differently than humans, e.g., when an entry was tagged Category #12 by the model and #9 by a human, who added a note:

”12 could also be considered but 9 is more important. Also makes changes to the return, but not mentioned in the description”,

for the following shot 15-word description generated via LLM:

"A Java code diff with 4 changes: catches ParseException, adds variable, and updates logic".

Case of Incomplete or unclear summaries:
Issues with the short 15-word description, such as not describing all changes or an unclear summary. For example, when the "if" statement was modified:

125d124
<         // Fri, 21 Nov 1997 09:55:06 -0600
127c126,129
<         final SimpleDateFormat format = new SimpleDateFormat(pattern, Locale.ENGLISH);
---
>         final Locale locale = Locale.ENGLISH;
>         final SimpleDateFormat format = new SimpleDateFormat(pattern, locale);
>         // assume no header date by default
>         boolean hasHeaderDate = false;
129a132
>             hasHeaderDate = true;
133a137,140
>         if (hasHeaderDate) {
>             // add a newline after the date field
>             header.append(""\n"");
>         }

but this was not clear from the summary:

"SimpleDateFormat constructor and locale usage changed, with additional logic for header date detection".

Content of Files in the Artifact:

Datasets:

Raw Data is taken from here: https://zenodo.org/records/13381774.
Initial Manual Clustering: clustering of 309 entries from JUnit4 and JCodec projects, with LLM patches generated with Mistral LLM.
Size of dataset: 309. File: Patch Analysis-anon.xlsx.
Augmented Dataset: The initial dataset was manually clustered after data augmentation.
Size of dataset: 5806 (unique). File: DataAugmentation_Approach_Patch Classification_subsection.xlsx
Validation (RQ1): Validation on unseen datasets (unseen projects, and/or unseen LLM-generated patches model).
Size of dataset: 218. File: RQ2-dataset-all_patch_summaries.xlsx
Statistics (RQ2): Data used to construct statistics of LLM-generated patches in Gin from ForArtifact.zip.
Size of dataset: 3232. File: ForArtifact.zip

Dockers:

The model: the model is built via the offline approach to be used in the online approach in a Docker file, ready to test and use. File: model-in-a-docker-unseen−retrives−batch.tar.

Code:

Clustering is taken from here: https://github.com/rashadulrakib/short-text-clustering-enhancement, but applied to a new dataset.
- Clustering scripts, developed on top of the short-text-clustering work. File: clustering.zip.
Code of RQ1 is in ForArtifact.zip.

Files

clustering.zip

Files (1.0 GB)

Name	Size	Download all
clustering.zip md5:330c618b46f3b36e9d1d2b83dfd32ddf	975.1 kB	Preview Download
DataAugmentation_Approach_Patch Classification_subsection.xlsx md5:8e56ef3e336ea93f06febbbcd8ffe2dd	644.4 kB	Download
ForArtifact.zip md5:028d7df5da66af9964ba772394f4cb8e	42.6 MB	Preview Download
model-in-a-docker-unseen−retrives−batch.tar md5:36df55da8f197d8a05fcb85024608472	985.4 MB	Download
Patch Analysis-anon.xlsx md5:edcdc948389855e2373aaa1e96d219d2	74.7 kB	Download
RQ2-dataset-all_patch_summaries.xlsx md5:c12d8ec3aa5598eb153033963267ea9b	900.3 kB	Download

	All versions	This version
Views	194	194
Downloads	231	231
Data volume	31.5 GB	31.5 GB

Artifact of LLM-Guided Genetic Improvement: Envisioning Semantic Aware Automated Software Evolution

Authors/Creators

Description

Abstract:

Examples for RQ2 (full example text):

Content of Files in the Artifact:

Datasets:

Dockers:

Code:

Files

clustering.zip

Files (1.0 GB)