CFStra: Enhancing Configurable Program Analysis through LLM-driven Strategy Selection Based on Code Features

Jie, Su; Liansai, Deng; Cheng, Wen

doi:10.5281/zenodo.10792572

Published March 7, 2024 | Version 1.0

Software Open

CFStra: Enhancing Configurable Program Analysis through LLM-driven Strategy Selection Based on Code Features

1. Guangzhou Institute of Technology
2. Xidian University

Enhancing Configurable Program Analysis through LLM-driven Strategy Selection Based on Code Features

This project aims to leverage existing open-source LLMs such as ChatGPT and the CPAchecker verification suite to enhance configurable program analysis.

1. Background

Configurable Program Analysis (CPA) is a technique that allows users to customize the analysis of programs based on their specific needs and preferences. The well-known framework, CPAchecker, facilitates easy configuration and subsequent automatic execution of program analysis and verification procedures.
However, like many other program verification tools, CPAchecker requires users to choose both the strategy and configuration parameters. This often necessitates a deep understanding of the target source code and the use of expert knowledge to select strategies based on identified program features, which can be a tedious and error-prone process.

In this paper, we present a novel approach to efficiently perform program verification tasks by harnessing the capabilities of Large Language Models (LLMs) to automatically select verification strategies based on code features. Our key insight is that LLMs possess advanced program comprehension capabilities, allowing them to serve as proficient human experts in identifying code features, selecting verification strategies, and generating configuration parameters. In this spirit, we begin by extracting relevant code snippets and querying LLMs to identify code features, such as arrays, pointers, floating points, loops, multi-threading, etc.
Based on these identified code features, we propose a strategy selector to automatically choose the verification strategy. Finally, we execute the CPAchecker with the selected verification strategy and the generated configuration parameters.

We evaluated our approach using a diverse set of 600 verification tasks. The results demonstrate the effectiveness of our approach, as it is not only characterized by simplicity and ease of comprehension, but also outperforms individual methods.

2. Installation and Deployment

2.1 Requirements

Hardware Requirements: Workstations/PC with multi-core processors
Operating System: >= Ubuntu 18.04 LTS

2.2 Clone the LLM-Toolset parent repository and test the usability of GPT-3.5 and GPT-4.

```sh
# clone repo.
$ git clone https://github.com/wcventure/LLM-Toolset.git
$ cd LLM-Toolset

# Install core dependencies (required)
$ pip3 install -r requirements.txt
# or
$ pip3 install -r requirements.txt -i http://pypi.douban.com/simple --trusted-host pypi.douban.com
```

The template for the configuration file is located in the config template.json file in the conf directory, and the template needs to be copied to create the final effective config.json file:
```sh
$ cp config-template.json config.json
```

Please edit your proxy IP in conf/conf.json, for example:
```
{
"proxy": "http://127.0.0.1:4780",
"https_proxy": "http://127.0.0.1:4780",
"http_proxy": "http://127.0.0.1:4780",
}
```
(If you are overseas, or using a global mode, leave the double quotes empty.)

A simple example is already prepared in the tutorial folder. A simplified library is in turorial/libgpt.py folder. Running the following command to query the ChatGPT with the sample prompt.
```sh
$ python3 tutorial/hello_world.py
```

2.3 Clone the CFStra submodule.

```sh
# clone sub repo.
$ cd LLM-Toolset
$ git clone https://gitee.com/lapulatos/llm4-ss.git CFStra
```

2.4 Install CPAchecker
```
# install cpachecker.
$ cd LLM-Toolset/CFStra
$ bash ./scripts/intall/install_cpachecker.sh
```

2.5 Install BenchExec
```
# install benchexec.
$ git clone https://github.com/sosy-lab/benchexec.git
$ source ./scripts/install/install_benchexec.sh
```

3. Usage

3.1 Set up Environment

Everytime you open a new terminal you need to set up an environment variable.

```sh
# set up environment
$ cd LLM-Toolset
$ export PYTHONPATH=$PWD:$PYTHONPATH
```

3.2 Determine the strategy to verify a program

Usage:

For example: `test/test.c`

```C
// This file is part of the SV-Benchmarks collection of verification tasks:
// https://github.com/sosy-lab/sv-benchmarks
//
// SPDX-FileCopyrightText: 2011-2020 The SV-Benchmarks community
// SPDX-FileCopyrightText: The CSeq project
//
// SPDX-License-Identifier: Apache-2.0

extern void abort(void);
#include <assert.h>
void reach_error() { assert(0); }

#include <stdlib.h>
#include <pthread.h>
#include <string.h>

void __VERIFIER_assert(int expression) { if (!expression) { ERROR: {reach_error();abort();}}; return; }

char *v;

void *thread1(void * arg)
{
v = calloc(8, sizeof(char));
return 0;
}

void *thread2(void *arg)
{
if (v) strcpy(v, "Bigshot");
return 0;
}

int main()
{
pthread_t t1, t2;

pthread_create(&t1, 0, thread1, 0);
pthread_create(&t2, 0, thread2, 0);
pthread_join(t1, 0);
pthread_join(t2, 0);

__VERIFIER_assert(!v || v[0] == 'B');

return 0;
}
```

run `python3 ./confselect/strategy_selection.py -s ./test/test.c -p ./test/unreach-call.prp -ef -sd ./config/strategy_descriptions.txt`

Then you can utilize LLM to extract the features of code and specification, and get the predict stratetgy file `config/svcomp23-pichecker.properties` in a fixed strategy selection order, looks like:

```C
INFO:root:👻: What are the features of the given source code and specification?

INFO:root:LLM querying ...

👽: Code Features: Concurrency
Specification Features: unreach

INFO:root:total_token: 820
Selected Config: config/svcomp23-pichecker.properties

start time: 2024-02-27 13:56:51.804696, stop time: 2024-02-27 13:56:54.877647
@@@ Finished @@@
```

run `python3 ./confselect/strategy_selection.py -s ./test/test.c -p ./test/unreach-call.prp -ef -ss -sd ./config/strategy_descriptions.txt -pt fc`

Then you can utilize LLM to extract the features of code and specification, and then use LLM to get the predict stratetgy file `config/svcomp23-pichecker.properties`, looks like:

```
INFO:root:👻: What are the features of the given source code and specification?

INFO:root:LLM querying ...

👽: Code Features: Concurrency, Pointer, Array Access, Member Variable Access

Specification Features: Unreach

INFO:root:total_token: 790
INFO:root:👻: What is the best strategy for verifying the given program?

INFO:root:LLM querying ...

👽: Selected Config: PIChecker

INFO:root:total_token: 1589
Selected Config: config/svcomp23-pichecker.properties

start time: 2024-02-27 14:05:48.277121, stop time: 2024-02-27 14:05:52.078079
@@@ Finished @@@
```

3.3 Benchmarking

Task definition:
```XML
<?xml version="1.0"?>
<!DOCTYPE benchmark PUBLIC "+//IDN sosy-lab.org//DTD BenchExec benchmark 1.9//EN" "https://www.sosy-lab.org/benchexec/benchmark-2.3.dtd">
<benchmark tool="cfstra" timelimit="15 min" hardtimelimit="16 min" memlimit="15 GB" cpuCores="1" threads="2" >

<resultfiles>**/*.graphml</resultfiles>

<option name="-heap">10000M</option>
<option name="-benchmark"/>
<option name="-timelimit">900 s</option>
<option name="-m">gpt-3.5-turbo</option>

<option name="-d">1</option>
<option name="-sd">./CFStra/config/strategy-descriptions.txt</option>
<option name="-pt">fc</option>

<rundefinition name="Config-Selection">
<tasks name="ConcurrencySafety-Main">
<includesfile>../sv-benchmarks/c/ConcurrencySafety-Main.set</includesfile>
<propertyfile>../sv-benchmarks/c/properties/unreach-call.prp</propertyfile>
</tasks>
</rundefinition>

<columns>
<column title="UsedConfig">Used algorithm property</column>
</columns>

</benchmark>
```
run `benchexec --no-container --tool-directory ./CFStra ./cfstra.xml`

Files

cfstra.zip

Files (34.3 MB)

Name	Size	Download all
cfstra.zip md5:cca06a98215ae2d844dee18ecf2e3645	34.3 MB	Preview Download

Additional details

Programming language: Python

	All versions	This version
Views	147	147
Downloads	28	28
Data volume	960.2 MB	960.2 MB

CFStra: Enhancing Configurable Program Analysis through LLM-driven Strategy Selection Based on Code Features

Creators

Description

Enhancing Configurable Program Analysis through LLM-driven Strategy Selection Based on Code Features

1. Background

2. Installation and Deployment

2.1 Requirements

2.2 Clone the LLM-Toolset parent repository and test the usability of GPT-3.5 and GPT-4.

2.3 Clone the CFStra submodule.

3. Usage

3.1 Set up Environment

3.2 Determine the strategy to verify a program

3.3 Benchmarking

Files

cfstra.zip

Files (34.3 MB)

Additional details

Software