Published August 3, 2024
| Version v3
Dataset
Open
Maniple
Creators
Description
Maniple
This repository contains code, scripts and data necessary to reproduce the paper "The Fact Selection Problem in LLM-Based Program Repair".Installation
Before installing the project, ensure you have the following prerequisites installed on your system:- Python version 3.10 or higher.
Follow these steps to install and set up the project on your local machine:
cd maniple
python3 -m pip install .
Structure of Directories
The project is organized into several directories, each serving a specific purpose:data/ # Training and testing datasets
BGP32.zip/ # Sampled 32 bugs from the BugsInPy dataset
black/ # The bug project folder
10/ # The bug ID folder
100000001/ # The bitvector used for prompting
prompt.md # The prompt used for this bitvector
response_1.md # The response from the model
response_1.json # The response in JSON format
response_1.patch # The response in patch format
result_1.json # Testing result
...
BGP32-without-cot.zip # GPT response for 32 bugs without CoT prompting
BGP314.zip # 314 bugs from the BugsInPy dataset
BGP157Ply1-llama3-70b.zip # experiment with llama3 model on BGP157Ply1 dataset
BGP32-permutation.zip # permutation experiment on BGP32 dataset
maniple/ # Scripts for getting facts and generate prompts
strata_based/ # Scripts for generating prompts
utils/ # Utility functions
metrics/ # Scripts for calculating metrics for dataset
patch_correctness_labelling.xlsx # The labelling of patch correctness
experiment.ipynb # Jupyter notebook for training models
experiment-initialization-resources/ # Contains raw facts for each bug
bug-data/ # row facts for each bug
ansible/ # Bug project folder
5/ # Bug ID folder
bug-info.json # Metadata for the bug
facts_in_prompt.json # Facts used in the prompt
processed_facts.json # Processed facts
external_facts.json # GitHub issues for this bug
static-dynamic-facts.json # Static and dynamic facts
...
datasets-list/ # Subsets from BugsInPy dataset
strata-bitvector/ # Debugging information for bitvectors
Steps to Reproduce the Experiments
Please follow the steps below sequentially to reproduce the experiments on 314 bugs in BugsInPy with our bitvector based promptPrepare the Dataset
The CLI scripts under the `maniple` directory provide useful commands to download and prepare environments for each bug.To download and prepare environments for each bugs, you can use the `prep` command.
maniple prep --dataset 314-dataset
This script will automatically download all 314 bugs from GitHub, create a virtual environment for the bug and install the necessary dependencies.
Fact Extraction
Then you can extract facts from the bug data using the `extract` command as follows:maniple extract --dataset 314-dataset --output-dir data/BGP314
This script will extract facts from the bug data and save them in the specified output directory.
You can find all extracted facts under the `experiment-initialization-resources/bug-data` directory.
Generate Bitvector Specific Prompts and Responses
First, you need to generate bitvector for the facts. The 128 bitvector for our paper can be generated by the following command.python3 -m maniple.strata_based.fact_bitvector_generator
You can customize your bitvectors, they should be put under `experiment-initialization-resources/strata-bitvectors` directory. You can refer the example bitvector format used for our paper.
To reproduce our experiment prompt and response, please use the command below, and replace <YOUR_OPENAI_KEY> with your own key.
# On Linux/macOS:
export OPENAI_API_KEY=<YOUR_OPENAI_KEY>
# On windows:
setx OPENAI_API_KEY <YOUR_OPENAI_KEY>
python3 -m maniple.strata_based.prompt_generator --database BGP314 --partition 10 --start_index 1 --trial 15
Again, you can build your own customize prompt with customize bitvector using our extracted facts. Above is only for reproducing our prompt and response.
This script will generate prompts and responses for all 314 bugs in the dataset by enumerating all possible bitvectors according to current strata design specified in `maniple/strata_based/fact_strata_table.json`. By specifying `--trial 15`, the script will generate 15 responses for each prompt. And by specifying `--partition 10` the script will start 10 threads to speed up the process.
Testing Generated Patches
Please use following command:maniple validate --output-dir data/BGP314
This script will validate the generated patches for the specified bug and save the results in the specified output directory. The test comes from the developer's fix commit.
Contributing
Contributions to this project are welcome! Please submit a PR if you find any bugs or have any suggestions.License
This project is licensed under the MIT - see the LICENSE file for details.Files
maniple.zip
Files
(1.0 GB)
Name | Size | Download all |
---|---|---|
md5:61673640ff8c2afd3cc59e9803a9c911
|
1.0 GB | Preview Download |