Exploring Fine-Grained Bug Report Categorization with Large Language Models and Prompt Engineering: An Empirical Study
Authors/Creators
Description
Replication Package for: Exploring Fine-Grained Bug Report Categorization with Large Language Models and Prompt Engineering: An Empirical Study
This repository contains the scripts to replicate the experiments for the paper "Exploring Fine-Grained Bug Report Categorization with Large Language Models and Prompt Engineering: An Empirical Study."
Zenodo
The replication package is available at Zenodo.
Dependencies
The following dependencies are required to run the experiments:
-
Ollama: The user must download models from Ollama before executing the pipeline.
-
Anaconda / Miniconda: Download and configure Anaconda or Miniconda for managing Python environments.
Create a Python environment using the environment file:
conda env create -f environment.yml
How to Run
The main script to run the experiments is prompt.py. To execute the experiment:
conda activate llm-fine-grained-bug-categorization
python prompt.py -c config.ini
config.ini contains the configurations for the experiments. Below is an example of the configuration file:
[Model]
name = gemma2:9b-instruct-q4_0
temperature = 0
set_system = True
num_ctx = 8192
num_runs = 1
;diid = XERCESC-211
;host = http://localhost:11435
[Prompt]
reverse = True
type = 83215003
Configuration Explanation:
-
name: Specifies the model name. You can list multiple models separated by commas.
name = llama3.1:8b-instruct-q4_0, llama3:8b-instruct-q4_0, qwen2:7b-instruct-q4_0, gemma:7b-instruct-v1.1-q4_0, gemma2:9b-instruct-q4_0, starling-lm:7b-beta-q4_0, aya:8b-23-q4_0 -
temperature: Defines the temperature for the LLM.
-
set_system: Flag to set the system. Used to invoke the generate endpoint by setting the system message.
-
num_ctx: LLM context size.
-
num_runs: The number of runs to perform (currently only one run is supported).
-
diid: Bug report ID (for debugging purposes, used to test the prompt and LLM).
-
host: Host to use.
-
reverse: Flag to reverse the execution order of the dataset.
-
type: Specifies the prompt type. The number corresponds to the prompt template in the
prompts/folder.
Project Structure
|--- README.md : User guidance.
|--- analysis : Folder to save the extracted categories and classification metrics.
|--- base : Folder containing Ollama API implementation to interact with the LLM.
|--- corrected_dataset : Folder containing the corrected dataset of bug report categories, refined through validation methods to improve human annotations.
|--- experimentResults : Folder to save the LLM responses.
|--- export : Folder containing the dataset of 221,184 LLM-generated bug report categories, created using six LLMs, nine prompt types, and four output configurations for 1,024 bug reports exported in HTML format.
|--- prompts : Folder with prompt templates.
|--- reports : Folder containing the bug report XML files (zipped with `.gz`).
|--- all_non_sensical.xlsx : Contains all LLM responses for which no valid bug report category could be extracted.
|--- all_non_sensical-odd-analysis.xlsx : Manual analysis of Out-of-Dictionary bug report categories generated by LLM.
|--- dataset_additional_analysis.xlsx : Catolino dataset analysis.
|--- config.ini : Example configuration file.
Analysis Scripts
The following scripts are used to analyze the results of the experiments. You can run them to replicate various parts of the analysis:
-
analyze.py: Contains the script that calculates classification metrics for bug report categorization.
-
analyzeCore.py: Contains the script for classification metrics analysis of bug report categorization.
-
gainLoss.py: Analyzes the gain and loss of unique categories across different models and configurations.
-
plotDiscardDist.py: Analyzes LLM responses and generates a plot of the distribution of common output deviations.
-
labelCorrectness.py: Analyzes the correctness of labels using agreements and disagreements from LLM-generated responses via voting.
Files
llm-fine-grained-bug-categorization.zip
Files
(487.4 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:176c8a8cc8f0c9f601fbd98a203958cc
|
487.4 MB | Preview Download |