Reproduction package for VMCAI 2024 submission `Augmenting Interpolation-Based Model Checking with Auxiliary Invariants'

Beyer, Dirk; Chien, Po-Chun; Lee, Nian-Ze

doi:10.5281/zenodo.8301927

Published September 8, 2023 | Version VMCAI24-submission

Software Open

Reproduction package for VMCAI 2024 submission `Augmenting Interpolation-Based Model Checking with Auxiliary Invariants'

1. LMU Munich, Germany

Reproduction Package

Augmenting Interpolation-Based Model Checking with Auxiliary Invariants

Abstract

This artifact is a reproduction package for the manuscript “Augmenting Interpolation-Based Model Checking with Auxiliary Invariants”, submitted to VMCAI 2024 (paper ID: 84).

It consists of source code, precompiled executables, and input data used in the evaluation of the paper, as well as the obtained experimental results. Specifically, it contains the source code and precompiled binaries of CPAchecker at revision 42901, the executables of 2LS and Symbiotic, the benchmark suite of SV-COMP ’22, the raw and processed data collected from our experiments, and the scripts to reproduce the evaluation results.

This reproduction package works best with the SoSy-Lab Virtual Machine (tested with Oracle VM VirtualBox v6.1), which runs Ubuntu 22.04 LTS and has all the necessary packages to perform the evaluation. (If you test this artifact with the SoSy-Lab VM, you can skip all installation steps below.)

By default, we assign 4 CPU cores and 15 GB of memory to each verification task. A full reproduction of all the experiments requires roughly 3 months of CPU time. For demonstration purposes, a subset of the benchmark tasks can be executed using 1 CPU core and 3 GB of memory, which takes roughly 6 hours of CPU time in total.

Contents

This artifact contains the following items:

README.{html,pdf}: this documentation (the HTML version is recommended)
LICENSE: license information of the artifact
Augmenting_IMC_with_Auxiliary_Invariants.pdf: a preprint of the submitted manuscript
example.c: an example C program for demonstration (see Fig. 1 of the manuscript)
cpachecker/: a directory containing the source code and precompiled binaries of CPAchecker, which implements the proposed approaches
2ls/: a directory containing the executables of 2LS downloaded from the SV-COMP ’22 tool archives
symbiotic/: a directory containing the executables of Symbiotic downloaded from the SV-COMP ’22 tool archives
sv-benchmarks/: a directory containing the SV-COMP ’22 benchmark tasks used in our evaluation
bench-defs/: a directory containing the benchmark definitions of the experiments (used by BenchExec, a framework for reliable benchmarking)
data-submission/: a directory containing the raw and processed data produced from our full evaluation (used in the manuscript, under paper-results/) and from a demo experiment (prepared for this reproduction package, under demo-results/)
Makefile: a file containing commands for running experiments and processing data

This readme file will guide you through the following steps:

Set up evaluation environment
Execute software verifiers
Perform experiments
Analyze experimental data
Known issues of the artifact

TL;DR

Type the following commands in the root folder of the reproduction package to:

Check BenchExec configuration:
- make test-benchexec: test if BenchExec has been installed (see the installation guide)
- make test-cgroups: test if cgroups are configured correctly
Run software verifiers on the example in Fig. 1 of the submission (time limit set to 10 seconds for quick response):
- make timelimit=10s test-cpachecker: the proposed analysis in CPAchecker
- make timelimit=10s test-2ls: the default analysis of 2LS
- make timelimit=10s test-symbiotic: the default analysis of Symbiotic
Perform a demo experiment on 30 tasks: make run-demo-exp
- To quickly check the if experiment is runnable, you could use make timelimit=60s memlimit=3GB cpulimit=1 run-demo-exp to limit the CPU run-time to 60 seconds, memory to 3 GB, and the number of CPU cores to 1 per task.
- After the run is finished, an HTML table containing the experimental results can be found in results/demo.table.html
Perform a full experiment: make run-full-exp
- 4 CPU cores, 15 GB of RAM, and 900 seconds of CPU time are given to each task. The full experiment takes roughly 3 months of CPU time.
- After the run is finished, HTML tables containing the experimental results can be found in results/*.table.html

To view HTML files corresponding to tables and figures of the paper, please open the following links with a browser. (Links work better with README.html)

Note that the figures and tables in the manuscript are formatted for space reasons, so some of them do not look exactly the same as the HTML files.

Figure 2(a) (Note that the number of program unrollings reported by CPAchecker has a constant offset of +1, i.e., a number k in the HTML plot corresponds to k-1 in the manuscript.)
Figure 2(b)
Table 1-1 (Note that in the manuscript, we reported the run-time with two significant digits.)
Table 1-2
Table 2
A summary table for Figure 3
Figure 4(a) (Note that in the manuscript, we cropped the first 400 tasks solved within 1 minute for space.)
Figure 4(b)
Figure 5(a)
Figure 5(b)
Table 3
Figure 6

To improve readability, in the following we only excerpt important fragments of the logs. The complete log messages for the above commands are listed in data-submission/complete-logs.html for reference.

Set Up Evaluation Environment

Hardware Requirements

For the demo experiment, 3 GB of memory and 1 CPU core are allocated for a verification task. For the complete experiment, 15 GB of memory and 4 CPU cores are used. Please provide hardware resources higher than a benchmark task requires. An internet connection is not necessary.

Software Requirements

This artifact requires a Linux-based operating system using cgroups v1 and has been tested on a 64-bit Ubuntu 22.04 computer with Linux kernel 5.15.0.

In addition, the following software dependencies are requisite:

BenchExec 3.17 (installation guide)
Clang 14.0.0
Java Runtime Environment (JRE) 11 or above
libz3-dev 4.8.12

Execute Software Verifiers

Run Different Verification Algorithms in CPAchecker

To execute CPAchecker on a C program example.c, please run:

make timelimit=10s c-prog=example.c cpa-config=imc_i-df test-cpachecker

You can change the time limit, the input C program and the used configuration by passing the arguments to timelimit, c-prog, and cpa-config, respectively. The following configurations are supported:

imc: plain interpolation-based model checking (IMC, McMillan 2003)
imc_f-df: augmented IMC with fixed-point checks strengthened (IMC_f ← DF)
imc_i-df: augmented IMC with interpolants strengthened (IMC_i ← DF)
ki-df: k-Induction boosted by auxiliary invariants (KI ← DF)
impact: Impact (Impact)
pred_abs: Predicate Abstraction (PredAbs)

Below is an example output shown on the console after the analysis is finished.

[…redacted…]

Verification result: TRUE. No property violation found by chosen configuration. More details about the verification run can be found in the directory “./output”.

There are 3 possible outcomes of the verification result:

TRUE: the program is “safe”, i.e. it does not violate the given property
FALSE: the program is “unsafe”, i.e. it contains a violation to the given property
UNKNOWN: the program might contain some unsupported feature, or the analysis went into some error (timeout, out of memory, etc.)

For the program example.c, IMC, IMC_f ← DF, IMC_i ← DF, and PredAbs are able to deliver a proof within 10 seconds, whereas KI ← DF and Impact are not.

Also note that there will be no output/ folder, because CPAchecker is executed with the -noout option (see line 22 of the Makefile).

Run 2LS and Symbiotic

To execute 2LS or Symbiotic on a C program, please run:

make timelimit=10s c-prog=example.c test-2ls # or test-symbiotic

You can change the time limit and the input C program by passing the arguments to timelimit and c-prog, respectively.

2LS can prove the program example.c within 10 seconds, while Symbiotic cannot.

Perform Experiments

We provide two settings for the experiments: one for the demo run and the other for the full run. The two settings differ in (1) the set of executed tasks and (2) the executed tools/algorithms. All the other common settings are explained below.

Experimental Settings

The settings are described in the XML files bench-defs/*.xml. These XML files are used by BenchExec, a framework for reliable benchmarking.

For the execution of a task, a default resource limit of 4 CPU cores, 900 seconds of CPU time, and 15 GB of memory is imposed. (If the required memory amount is not available on your system, please follow the instructions explained below to adjust the limit.)

The XML files contain the following configurations of the compared verifiers in the evaluation, namely:

CPAchecker
- Compared SMT-based algorithms: imc, imc_f-df, imc_i-df, ki-df, impact, and pred_abs
- Different random seeds for plain and augmented IMC: imc-rs{7,61,89,165} and imc_i-df-rs{7,61,89,165}
2LS: default (configuration used in SV-COMP ’22)
Symbiotic: svcomp (configuration used in SV-COMP ’22)

Before you start executing any experiment, please make sure that

BenchExec is successfully installed by running make test-benchexec and
cgroups are correctly configured by running make test-cgroups.

Demo Run on the Selected Tasks

A complete experiment on the whole benchmark suite consisting of 1623 C-verification tasks (listed in bench-defs/sets/overall.set) takes a vast amount of time (the elapsed CPU time in our experiment was about 3 months). The experimental data produced from the full evaluation reported in the paper can be found in folder data-submission/paper-results/.

To show how our experiments were conducted, we selected 30 tasks from the benchmark suite (listed in bench-defs/sets/demo.set) and 3 algorithms (plain and augmented IMC: IMC, IMC_f ← DF, and IMC_i ← DF) in CPAchecker for demonstration.

We emphasize that the demo run is only for demonstration purposes. The observations on the comparison between algorithms and tools in the manuscript were drawn from the evaluation on the whole benchmark suite.

This demonstrative experiment was designed such that it is feasible given reasonable hardware equipment and time: it could be finished within several hours on a laptop.

To perform the demonstrative experiment, run the command below:

make run-demo-exp

Below is an example on how to adjust the resource limits. Suppose you would like to set the time limit to 60 seconds, the memory limit to 3 GB, and use only 1 CPU core for a task, please run:

make timelimit=60s memlimit=3GB cpulimit=1 run-demo-exp

Moreover, if you have enough hardware resources and would like to launch parallel benchmark tasks, add benchexec-args="-N <num_jobs>" to the make command. For more usage information about BenchExec, please refer to benchexec -h.

After the run is finished, an HTML table containing the experimental results can be found in results/demo.table.html.

Full Run on the Complete Benchmark Suite

As mentioned above, the total CPU time elapsed for a complete experiment is about 3 months, and 900 seconds of CPU time, 15 GB of memory, and 4 CPU cores are given to each benchmark task.

To perform the full experiment, run the command:

make run-full-exp

The full experiments can be split into 3 make-targets.

make run-aug-imc-exp:

Evaluate IMC, IMC_f ← DF, and IMC_i ← DF on 870 tasks (listed in bench-defs/sets/nontrivial-inv.set) where DF, the invariant generation component in CPAchecker, is able to generate non-trivial inductive invariants. The experimental results are summarized in Fig. 2, Table 1, Table 2, Fig. 4, and Fig. 5 of the manuscript.
make run-rand-seed-exp:

Compare the IMC and IMC_i ← DF using different random seeds for SMT solving on 870 tasks. The experimental results are summarized in Fig. 3 of the manuscript.
make run-cmp-exp:

Compare IMC_i ← DF against other SMT-based algorithms (KI ← DF, Impact, and PredAbs) in CPAchecker and 2 state-of-the-art verifiers (2LS and Symbiotic) from SV-COMP ’22 on the whole benchmark suite. The experimental results are summarized in Table 3 and Fig. 6 of the manuscript.

After the run is finished, HTML tables containing the experimental results can be found in results/*.table.html.

Analyze the Experimental Data

We recommend to take advantage of the interactive HTML files to help visualize the results of the experiments. These files can be easily opened with a web browser (e.g. firefox), and can display the information presented in all tables and figures of the manuscript.

Results from Our Experiments

The results (both raw and processed data) of the demo run and full run obtained by our machines are in folder data-submission/demo-results/ and data-submission/paper-results/, respectively. The demo run was performed in order to prepare this artifact, and the full run was performed to collect the data used in the paper.

The generated HTML files are:

tab1-1.imc_i-df.improvement.table.html: generated by the make-target run-aug-imc-exp (Table 1-1 in the manuscript)
tab1-2.imc_f-df.improvement.table.html: generated by the make-target run-aug-imc-exp (Table 1-2 in the manuscript)
tab2.augmented-imc.summary.table.html: generated by the make-target run-aug-imc-exp (Fig. 2, Table 2, Fig. 4, and Fig. 5 in the manuscript)
fig3.random-seed.table.html: generated by the make-target run-rand-seed-exp (Fig. 3 in the manuscript)
tab3.overall-comparison.table.html: generated by the make-target run-cmp-exp (Table 3 and Fig.6 in the manuscript)

We also provide pre-configured links to easily view the exact tables/figures as shown in the paper, as listed in the TL;DR section.

If you want to re-generate all the above HTML files from the raw data obtained by our experiments, run make gen-paper-tables. Note that this command will overwrite the existing files.

Navigate Through the Data

Once an experiment is finished, the Makefile automatically collects the results and generates the HTML file, whose path is printed on the console.

A sample output printed at the end of demo run:

[…redacted…]

INFO: Merging results… INFO: The resulting table will have 30 rows and 21 columns (in 3 run sets). INFO: Generating table… INFO: Writing HTML into /path/to/artifact/results/demo.table.html … INFO: done

When opening the generated HTML table, you will be guided to the Summary page of the experiment, where detailed settings of the experiment and a summary table of the compared tools/algorithms are displayed. If you open tab2.augmented-imc.summary.table.html, in this page you can see the number of proofs found by each compared approach as reported in Table 2.

To see the full table, please navigate to the tab Table. By filtering the status from the drop-down menus, you can see the results of Timeouts, Out of memory, and Other inconclusive of each compared approach as reported in Table 2.

To inspect the log file of an individual task, click on the status of that task. If the log file cannot be displayed, configure your browser according to the printed instructions.

To filter tasks, you can make use of the task filter at the upper-right corner of the page. To view quantile plots, please navigate to tab Quantile Plot and adjust the drop-down menus as you prefer. To view scatter plots, please navigate to tab Scatter Plot, and adjust the x- and y- axes according to your interests.

Known Issues of the Artifact

Known issues of this artifact are documented below.

CPU-throttling Warnings

When you perform the demo or full runs (especially on a laptop), BenchExec might raise the following warning:

2023-XX-XX XX:XX:XX - WARNING - CPU throttled itself during benchmarking due to overheating. Benchmark results are unreliable!

This is normal on a laptop. Please ignore it.

Complete Logs

The complete logs produced by each command mentioned above can be found in data-submission/complete-logs.html for reference.

Files

IMCDF-artifact-VMCAI24-submission.zip

Files (1.5 GB)

Name	Size
IMCDF-artifact-VMCAI24-submission.zip md5:d5043a51f595f18ac771f137b2d0724f	1.5 GB	Preview Download

	All versions	This version
Views	567	161
Downloads	85	29
Data volume	134.0 GB	43.5 GB

Reproduction package for VMCAI 2024 submission `Augmenting Interpolation-Based Model Checking with Auxiliary Invariants'

Authors/Creators

Description

Files

IMCDF-artifact-VMCAI24-submission.zip

Files (1.5 GB)