Artifact of LLM-Guided Genetic Improvement: Envisioning Semantic Aware Automated Software Evolution
Authors/Creators
Description
Abstract:
we propose the investigation of a new research line on AI-powered GI aimed at incorporating semantic aware search. We take a first step at it by augmenting GI with the use of automated clustering of LLM edits. We provide initial empirical evidence that our proposal, dubbed PatchCat, allows us to automatically and effectively categorize LLM-suggested patches. PatchCat identified 18 different types of software patches and categorized newly suggested patches with high accuracy. It also enabled detecting NoOp edits in advance and, prospectively, to skip test suite execution to save resources in many cases. These results, coupled with the fact that PatchCat works with small, local LLMs, are a promising step toward interpretable, efficient, and green GI.
Note: This is the artifact of ASE NIER 2025 publication.
Examples for RQ2 (full example text):
Reasons for inconsistency in tagging:
Case of Different prioritization by the model and humans:
The model prioritizes importance differently than humans, e.g., when an entry was tagged Category #12 by the model and #9 by a human, who added a note:
”12 could also be considered but 9 is more important. Also makes changes to the return, but not mentioned in the description”,
for the following shot 15-word description generated via LLM:
"A Java code diff with 4 changes: catches ParseException, adds variable, and updates logic".
Case of Incomplete or unclear summaries:
Issues with the short 15-word description, such as not describing all changes or an unclear summary. For example, when the "if" statement was modified:
125d124
< // Fri, 21 Nov 1997 09:55:06 -0600
127c126,129
< final SimpleDateFormat format = new SimpleDateFormat(pattern, Locale.ENGLISH);
---
> final Locale locale = Locale.ENGLISH;
> final SimpleDateFormat format = new SimpleDateFormat(pattern, locale);
> // assume no header date by default
> boolean hasHeaderDate = false;
129a132
> hasHeaderDate = true;
133a137,140
> if (hasHeaderDate) {
> // add a newline after the date field
> header.append(""\n"");
> }
but this was not clear from the summary:
"SimpleDateFormat constructor and locale usage changed, with additional logic for header date detection".
Content of Files in the Artifact:
Datasets:
- Raw Data is taken from here: https://zenodo.org/records/13381774.
- Initial Manual Clustering: clustering of 309 entries from JUnit4 and JCodec projects, with LLM patches generated with Mistral LLM.
Size of dataset: 309. File: Patch Analysis-anon.xlsx. - Augmented Dataset: The initial dataset was manually clustered after data augmentation.
Size of dataset: 5806 (unique). File: DataAugmentation_Approach_Patch Classification_subsection.xlsx - Validation (RQ1): Validation on unseen datasets (unseen projects, and/or unseen LLM-generated patches model).
Size of dataset: 218. File: RQ2-dataset-all_patch_summaries.xlsx - Statistics (RQ2): Data used to construct statistics of LLM-generated patches in Gin from ForArtifact.zip.
Size of dataset: 3232. File: ForArtifact.zip
Dockers:
- The model: the model is built via the offline approach to be used in the online approach in a Docker file, ready to test and use. File: model-in-a-docker-unseen−retrives−batch.tar.
Code:
- Clustering is taken from here: https://github.com/rashadulrakib/short-text-clustering-enhancement, but applied to a new dataset.
- Clustering scripts, developed on top of the short-text-clustering work. File: clustering.zip.
- Code of RQ1 is in ForArtifact.zip.
Files
clustering.zip
Files
(1.0 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:330c618b46f3b36e9d1d2b83dfd32ddf
|
975.1 kB | Preview Download |
|
md5:8e56ef3e336ea93f06febbbcd8ffe2dd
|
644.4 kB | Download |
|
md5:028d7df5da66af9964ba772394f4cb8e
|
42.6 MB | Preview Download |
|
md5:36df55da8f197d8a05fcb85024608472
|
985.4 MB | Download |
|
md5:edcdc948389855e2373aaa1e96d219d2
|
74.7 kB | Download |
|
md5:c12d8ec3aa5598eb153033963267ea9b
|
900.3 kB | Download |