There is a newer version of the record available.

Published May 7, 2022 | Version 0.0.1
Software Open

Reproduction Package for ASE 2022 Submission "Rete: Learning Namespace Representation for Program Repair"

Authors/Creators

Description

## Reproduction Package ##

This is the reproduction package for Rete for ASE submission.

 

## File System Contents ##

  1. Rete-Trident - Source code for running rete + Trident.
  2. rete-feature-extractor - Feature extraction process of Rete and Prophet for Python.
  3. Scripts - Script for graph and dataset information.
  4. eval - A sample example from manybugs which can run both Trident, Rete + Trident and SemFix

 

## Instructions to Run each of the Components ##

The individual Readme.md's in each directory should discuss how they are run.

 

## Building the Container

Building the Dockerfile

> docker build . -t reproduction_package

Going into the docker container

>docker run -v $(pwd):/home/Trident/ --rm -ti reproduction_package /bin/bash

 

## Tools

The package helps in running the following tools

  1. Rete --> Tool constructed in this paper.
  2. Rete + Trident --> A combination of Trident's specification with rete's synthesizer and variable prioritiser.
  3. Trident
  4. Prophet --> Implemented for Python
  5. SemFix

 

# Installing Feature Extractor

Build the docker container in infra, and then build the container in the current folder.

> cd infra/

> docker build -t rete/ubuntu-16.04-llvm-3.8.1 .

> cd ..

> docker build -t rete-feature .

 

# Running

Run the docker container by first mounting your current directory into "/tmp" to "/bin/bash"

> docker run -v $(pwd):/tmp --rm -it rete /bin/bash


You can ignore the mounting if you do not need the current directory. You can either compile inside docker

after mounting or directly run the build code present in "/rete" folder.

> cd /tmp/rete

> cmake .. -DF1X_LLVM=/llvm-3.8.1

> make

> chmod u+x rete

 

To extract a json of feature information.

> ./rete -output="<path>"

 

To extract intermediate CDU chain data.

> ./rete -get-chain-data -output="<path>"

# Python-Prophet

 

To extract Prophet Feature vector from features.

python3 rete-feature-extracter/learning/prophet.py feature-vector --buggy buggy_file.pcl --correct correct_file.pcl --mod-kind <MOD_TYPE> --line-no <line_no>

# Rete-Trident

Build and run a container:

>docker build -t rtrident .
>docker run --rm -ti rtrident /bin/bash

Build runtime:

cd runtime
KLEE_INCLUDE_PATH=/klee/include make

Run examples:
 

>./tests/assignment/run
>./tests/iterations/run
>./tests/multipath/run
>./tests/simple-rvalue/run

 

# Synthesizer interface

The synthesizer supports the following functionality:

  • Verifying a given patch
  • Generating a patch
  • Generating all patches

As specification, the synthesizer uses KLEE path conditions generated by Trident runtime, conjoined with test assertions.

To verify a given patch, run the following command:

python3.6 /path/to/synthesis.py --tests <ASSERTION_SMT_FILE>:<KLEE_OUTPUT_DIR> ... \
--components <COMPONENT_SMT_FILE> ... \
--verify <LOCATION ID>:<PATCH_FILE> ...
--templates <Template_Path>
--depth <int>
--model <model>
--theta <int>

If the template path is specified then the synthesizer uses Rete's Plastic surgery based synthesis using the templates extracted from the codebase. If the path is not mentioned synthesizer defaults to naive enumeration.

The names of some files are important: the names of component files are components IDs, the names of assertion files are test IDs.

Patch file can be either an SMT file with patch semantics (same as components without holes), or JSON file that describes a tree of components and a valuation of constants. Here is an example of such JSON file:

{

"tree": {

"node": "less-or-equal",

"children": {

"left": {

"node": "x"

},

"right": {

"node": "constant_a"

}

}

},

"constants": {

"a": 2

}

}


The above JSON describes the expression "x <= 2". Specifically, it encodes the tree of components "less-or-equal[left=x, right=constant_a]", and specifies that the constant "a" equals 2. Note that "a" does not refer to "constant_a" ("constant_a" is just a component ID), but to the variable "const_a" used in the definition of constant_a component.

To generate a patch, run the following command:

python3.6 /path/to/synthesis.py --tests <ASSERTION_SMT_FILE>:<KLEE_OUTPUT_DIR> ... \
--components <COMPONENT_SMT_FILE> ...

Add `--all` to generate all patches, and `--depth <DEPTH>` to generate expressions with the length from the root component to the leaves being at most DEPTH.

 

Components are defined as SMT formulas with special variables and rules how these variables can be used. Standard components are defined in the "components" directory. Note that these components do not include variables, since variables are project-specific.

 

Component definitions use special variables that can be classified as inputs and outputs. Component inputs include:

  • Constant symbol (`const_<NAME>`) - the value of a constant parameter (to be found by SMT solver)
  • Rvalue variable reference (`rvalue_<NAME>`) - the value of a program variable (passed to trident runtime)
  • Rvalue hole (`rhole_<NAME>`) - the value of a subtree of the current component

Component outputs include:

  • Rvalue return (`rreturn`) - the value of the tree rooted at the current component
  • Lvalue return (`lreturn`) - the value of the tree rooted at the current component after it is assigned
  • Lvalue variable reference (`lvalue_<NAME>`) - the value of a program variable (passed to trident runtime) after it is assigned
  • Lvalue hole (`lhole_<NAME>`) - the value of a subtree of the current component after it is assigned

The rules for using these variables are the following:

  • Each component should have an rreturn or an lreturn or both
  • The value of each output should be always determined by the values of inputs.
  • Constants and variable references are global, i.e. if the same constant or variable name is used in two components, these will be the same constants/variables. However, the names of holes are local to each component.

The following is the correct definition of guarded assignment component:

(declare-const lhole_left (_ BitVec 32))
(declare-const rhole_left (_ BitVec 32))
(declare-const rhole_right (_ BitVec 32))
(declare-const rhole_condition Bool)
(declare-const rreturn (_ BitVec 32))
(assert (and (= rreturn rhole_right)
(ite rhole_condition (= lhole_left rhole_right) (= lhole_left rhole_left))))

The above specifies that

  • `(= rreturn rhole_right)` - the statement as a whole outputs the value of the subtree "right"
  • `(= lhole_left rhole_right)` - if the condition is true, the value of the subtree "left" will be updated to the value of the subtree "right"
  • `(= lhole_left rhole_left)` - if the condition is false, the value of the subtree "left" will remain as it was originally

The following definition is incorrect, since it only updates `x` when it is positive, but does not specify it when it is negative:

(declare-const rvalue_x (_ BitVec 32))
(declare-const lvalue_x (_ BitVec 32))
(declare-const rreturn (_ BitVec 32))
(assert (and (= rreturn 0) (=> (> rvalue_x 0) (= lvalue_x 1))))

 

 

Files

reproduction_package.zip

Files (62.9 MB)

Name Size Download all
md5:194e683250f77ed866310e963c0de238
62.9 MB Preview Download