Published August 3, 2024 | Version v3
Dataset Open

Maniple

Description

Maniple

This repository contains code, scripts and data necessary to reproduce the paper "The Fact Selection Problem in LLM-Based Program Repair".

Installation

Before installing the project, ensure you have the following prerequisites installed on your system:
- Python version 3.10 or higher.

Follow these steps to install and set up the project on your local machine:

cd maniple
python3 -m pip install .

Structure of Directories

The project is organized into several directories, each serving a specific purpose:
data/               # Training and testing datasets
  BGP32.zip/        # Sampled 32 bugs from the BugsInPy dataset
    black/          # The bug project folder
      10/           # The bug ID folder
        100000001/      # The bitvector used for prompting
          prompt.md         # The prompt used for this bitvector
          response_1.md     # The response from the model
          response_1.json   # The response in JSON format
          response_1.patch  # The response in patch format
          result_1.json     # Testing result
    ...
  BGP32-without-cot.zip     # GPT response for 32 bugs without CoT prompting
  BGP314.zip                # 314 bugs from the BugsInPy dataset
  BGP157Ply1-llama3-70b.zip # experiment with llama3 model on BGP157Ply1 dataset
  BGP32-permutation.zip     # permutation experiment on BGP32 dataset

maniple/            # Scripts for getting facts and generate prompts
  strata_based/     # Scripts for generating prompts
  utils/            # Utility functions
  metrics/          # Scripts for calculating metrics for dataset

patch_correctness_labelling.xlsx  # The labelling of patch correctness
experiment.ipynb    # Jupyter notebook for training models

experiment-initialization-resources/  # Contains raw facts for each bug
  bug-data/         # row facts for each bug
    ansible/        # Bug project folder
      5/            # Bug ID folder
        bug-info.json              # Metadata for the bug
        facts_in_prompt.json       # Facts used in the prompt
        processed_facts.json       # Processed facts
        external_facts.json        # GitHub issues for this bug
        static-dynamic-facts.json  # Static and dynamic facts
    ...
  datasets-list/    # Subsets from BugsInPy dataset
  strata-bitvector/ # Debugging information for bitvectors

Steps to Reproduce the Experiments

Please follow the steps below sequentially to reproduce the experiments on 314 bugs in BugsInPy with our bitvector based prompt

Prepare the Dataset

The CLI scripts under the `maniple` directory provide useful commands to download and prepare environments for each bug.
To download and prepare environments for each bugs, you can use the `prep` command.

maniple prep --dataset 314-dataset

This script will automatically download all 314 bugs from GitHub, create a virtual environment for the bug and install the necessary dependencies.

Fact Extraction

Then you can extract facts from the bug data using the `extract` command as follows:
maniple extract --dataset 314-dataset --output-dir data/BGP314

This script will extract facts from the bug data and save them in the specified output directory.

You can find all extracted facts under the `experiment-initialization-resources/bug-data` directory.

Generate Bitvector Specific Prompts and Responses

First, you need to generate bitvector for the facts. The 128 bitvector for our paper can be generated by the following command.
python3 -m maniple.strata_based.fact_bitvector_generator

You can customize your bitvectors, they should be put under `experiment-initialization-resources/strata-bitvectors` directory. You can refer the example bitvector format used for our paper.

To reproduce our experiment prompt and response, please use the command below, and replace <YOUR_OPENAI_KEY> with your own key.

# On Linux/macOS:
export OPENAI_API_KEY=<YOUR_OPENAI_KEY>

# On windows:
setx OPENAI_API_KEY <YOUR_OPENAI_KEY>

python3 -m maniple.strata_based.prompt_generator --database BGP314 --partition 10 --start_index 1 --trial 15

Again, you can build your own customize prompt with customize bitvector using our extracted facts. Above is only for reproducing our prompt and response.

This script will generate prompts and responses for all 314 bugs in the dataset by enumerating all possible bitvectors according to current strata design specified in `maniple/strata_based/fact_strata_table.json`. By specifying `--trial 15`, the script will generate 15 responses for each prompt. And by specifying `--partition 10` the script will start 10 threads to speed up the process.

Testing Generated Patches

Please use following command:
maniple validate --output-dir data/BGP314

This script will validate the generated patches for the specified bug and save the results in the specified output directory. The test comes from the developer's fix commit.

Contributing

Contributions to this project are welcome! Please submit a PR if you find any bugs or have any suggestions.
 

License

This project is licensed under the MIT - see the LICENSE file for details.

Files

maniple.zip

Files (1.0 GB)

Name Size Download all
md5:61673640ff8c2afd3cc59e9803a9c911
1.0 GB Preview Download