There is a newer version of the record available.

Published September 8, 2023 | Version VMCAI24-submission

Reproduction package for VMCAI 2024 submission `Augmenting Interpolation-Based Model Checking with Auxiliary Invariants'

  • 1. LMU Munich, Germany

Description

Reproduction Package

Augmenting Interpolation-Based Model Checking with Auxiliary Invariants

 

Abstract

This artifact is a reproduction package for the manuscript “Augmenting Interpolation-Based Model Checking with Auxiliary Invariants”, submitted to VMCAI 2024 (paper ID: 84).

It consists of source code, precompiled executables, and input data used in the evaluation of the paper, as well as the obtained experimental results. Specifically, it contains the source code and precompiled binaries of CPAchecker at revision 42901, the executables of 2LS and Symbiotic, the benchmark suite of SV-COMP ’22, the raw and processed data collected from our experiments, and the scripts to reproduce the evaluation results.

This reproduction package works best with the SoSy-Lab Virtual Machine (tested with Oracle VM VirtualBox v6.1), which runs Ubuntu 22.04 LTS and has all the necessary packages to perform the evaluation. (If you test this artifact with the SoSy-Lab VM, you can skip all installation steps below.)

By default, we assign 4 CPU cores and 15 GB of memory to each verification task. A full reproduction of all the experiments requires roughly 3 months of CPU time. For demonstration purposes, a subset of the benchmark tasks can be executed using 1 CPU core and 3 GB of memory, which takes roughly 6 hours of CPU time in total.

Contents

This artifact contains the following items:

This readme file will guide you through the following steps:

TL;DR

Type the following commands in the root folder of the reproduction package to:

  • Check BenchExec configuration:
    • make test-benchexec: test if BenchExec has been installed (see the installation guide)
    • make test-cgroups: test if cgroups are configured correctly
  • Run software verifiers on the example in Fig. 1 of the submission (time limit set to 10 seconds for quick response):
    • make timelimit=10s test-cpachecker: the proposed analysis in CPAchecker
    • make timelimit=10s test-2ls: the default analysis of 2LS
    • make timelimit=10s test-symbiotic: the default analysis of Symbiotic
  • Perform a demo experiment on 30 tasks: make run-demo-exp
    • To quickly check the if experiment is runnable, you could use make timelimit=60s memlimit=3GB cpulimit=1 run-demo-exp to limit the CPU run-time to 60 seconds, memory to 3 GB, and the number of CPU cores to 1 per task.
    • After the run is finished, an HTML table containing the experimental results can be found in results/demo.table.html
  • Perform a full experiment: make run-full-exp
    • 4 CPU cores, 15 GB of RAM, and 900 seconds of CPU time are given to each task. The full experiment takes roughly 3 months of CPU time.
    • After the run is finished, HTML tables containing the experimental results can be found in results/*.table.html

To view HTML files corresponding to tables and figures of the paper, please open the following links with a browser. (Links work better with README.html)

Note that the figures and tables in the manuscript are formatted for space reasons, so some of them do not look exactly the same as the HTML files.

To improve readability, in the following we only excerpt important fragments of the logs. The complete log messages for the above commands are listed in data-submission/complete-logs.html for reference.

Set Up Evaluation Environment

Hardware Requirements

For the demo experiment, 3 GB of memory and 1 CPU core are allocated for a verification task. For the complete experiment, 15 GB of memory and 4 CPU cores are used. Please provide hardware resources higher than a benchmark task requires. An internet connection is not necessary.

Software Requirements

This artifact requires a Linux-based operating system using cgroups v1 and has been tested on a 64-bit Ubuntu 22.04 computer with Linux kernel 5.15.0.

In addition, the following software dependencies are requisite:

  • BenchExec 3.17 (installation guide)
  • Clang 14.0.0
  • Java Runtime Environment (JRE) 11 or above
  • libz3-dev 4.8.12

Execute Software Verifiers

Run Different Verification Algorithms in CPAchecker

To execute CPAchecker on a C program example.c, please run:

make timelimit=10s c-prog=example.c cpa-config=imc_i-df test-cpachecker

You can change the time limit, the input C program and the used configuration by passing the arguments to timelimit, c-prog, and cpa-config, respectively. The following configurations are supported:

  • imc: plain interpolation-based model checking (IMC, McMillan 2003)
  • imc_f-df: augmented IMC with fixed-point checks strengthened (IMCf ← DF)
  • imc_i-df: augmented IMC with interpolants strengthened (IMCi ← DF)
  • ki-df: k-Induction boosted by auxiliary invariants (KI ← DF)
  • impact: Impact (Impact)
  • pred_abs: Predicate Abstraction (PredAbs)

Below is an example output shown on the console after the analysis is finished.

[…redacted…]

Verification result: TRUE. No property violation found by chosen configuration.
More details about the verification run can be found in the directory “./output”.

There are 3 possible outcomes of the verification result:

  1. TRUE: the program is “safe”, i.e. it does not violate the given property
  2. FALSE: the program is “unsafe”, i.e. it contains a violation to the given property
  3. UNKNOWN: the program might contain some unsupported feature, or the analysis went into some error (timeout, out of memory, etc.)

For the program example.c, IMC, IMCf ← DF, IMCi ← DF, and PredAbs are able to deliver a proof within 10 seconds, whereas KI ← DF and Impact are not.

Also note that there will be no output/ folder, because CPAchecker is executed with the -noout option (see line 22 of the Makefile).

Run 2LS and Symbiotic

To execute 2LS or Symbiotic on a C program, please run:

make timelimit=10s c-prog=example.c test-2ls # or test-symbiotic

You can change the time limit and the input C program by passing the arguments to timelimit and c-prog, respectively.

2LS can prove the program example.c within 10 seconds, while Symbiotic cannot.

Perform Experiments

We provide two settings for the experiments: one for the demo run and the other for the full run. The two settings differ in (1) the set of executed tasks and (2) the executed tools/algorithms. All the other common settings are explained below.

Experimental Settings

The settings are described in the XML files bench-defs/*.xml. These XML files are used by BenchExec, a framework for reliable benchmarking.

For the execution of a task, a default resource limit of 4 CPU cores, 900 seconds of CPU time, and 15 GB of memory is imposed. (If the required memory amount is not available on your system, please follow the instructions explained below to adjust the limit.)

The XML files contain the following configurations of the compared verifiers in the evaluation, namely:

  • CPAchecker
    • Compared SMT-based algorithms: imc, imc_f-df, imc_i-df, ki-df, impact, and pred_abs
    • Different random seeds for plain and augmented IMC: imc-rs{7,61,89,165} and imc_i-df-rs{7,61,89,165}
  • 2LS: default (configuration used in SV-COMP ’22)
  • Symbiotic: svcomp (configuration used in SV-COMP ’22)

Before you start executing any experiment, please make sure that

  1. BenchExec is successfully installed by running make test-benchexec and
  2. cgroups are correctly configured by running make test-cgroups.

Demo Run on the Selected Tasks

A complete experiment on the whole benchmark suite consisting of 1623 C-verification tasks (listed in bench-defs/sets/overall.set) takes a vast amount of time (the elapsed CPU time in our experiment was about 3 months). The experimental data produced from the full evaluation reported in the paper can be found in folder data-submission/paper-results/.

To show how our experiments were conducted, we selected 30 tasks from the benchmark suite (listed in bench-defs/sets/demo.set) and 3 algorithms (plain and augmented IMC: IMC, IMCf ← DF, and IMCi ← DF) in CPAchecker for demonstration.

We emphasize that the demo run is only for demonstration purposes. The observations on the comparison between algorithms and tools in the manuscript were drawn from the evaluation on the whole benchmark suite.

This demonstrative experiment was designed such that it is feasible given reasonable hardware equipment and time: it could be finished within several hours on a laptop.

To perform the demonstrative experiment, run the command below:

make run-demo-exp

Below is an example on how to adjust the resource limits. Suppose you would like to set the time limit to 60 seconds, the memory limit to 3 GB, and use only 1 CPU core for a task, please run:

make timelimit=60s memlimit=3GB cpulimit=1 run-demo-exp

Moreover, if you have enough hardware resources and would like to launch parallel benchmark tasks, add benchexec-args="-N <num_jobs>" to the make command. For more usage information about BenchExec, please refer to benchexec -h.

After the run is finished, an HTML table containing the experimental results can be found in results/demo.table.html.

Full Run on the Complete Benchmark Suite

As mentioned above, the total CPU time elapsed for a complete experiment is about 3 months, and 900 seconds of CPU time, 15 GB of memory, and 4 CPU cores are given to each benchmark task.

To perform the full experiment, run the command:

make run-full-exp

The full experiments can be split into 3 make-targets.

  • make run-aug-imc-exp:

    Evaluate IMC, IMCf ← DF, and IMCi ← DF on 870 tasks (listed in bench-defs/sets/nontrivial-inv.set) where DF, the invariant generation component in CPAchecker, is able to generate non-trivial inductive invariants. The experimental results are summarized in Fig. 2, Table 1, Table 2, Fig. 4, and Fig. 5 of the manuscript.

  • make run-rand-seed-exp:

    Compare the IMC and IMCi ← DF using different random seeds for SMT solving on 870 tasks. The experimental results are summarized in Fig. 3 of the manuscript.

  • make run-cmp-exp:

    Compare IMCi ← DF against other SMT-based algorithms (KI ← DF, Impact, and PredAbs) in CPAchecker and 2 state-of-the-art verifiers (2LS and Symbiotic) from SV-COMP ’22 on the whole benchmark suite. The experimental results are summarized in Table 3 and Fig. 6 of the manuscript.

After the run is finished, HTML tables containing the experimental results can be found in results/*.table.html.

Analyze the Experimental Data

We recommend to take advantage of the interactive HTML files to help visualize the results of the experiments. These files can be easily opened with a web browser (e.g. firefox), and can display the information presented in all tables and figures of the manuscript.

Results from Our Experiments

The results (both raw and processed data) of the demo run and full run obtained by our machines are in folder data-submission/demo-results/ and data-submission/paper-results/, respectively. The demo run was performed in order to prepare this artifact, and the full run was performed to collect the data used in the paper.

The generated HTML files are:

We also provide pre-configured links to easily view the exact tables/figures as shown in the paper, as listed in the TL;DR section.

If you want to re-generate all the above HTML files from the raw data obtained by our experiments, run make gen-paper-tables. Note that this command will overwrite the existing files.

Navigate Through the Data

Once an experiment is finished, the Makefile automatically collects the results and generates the HTML file, whose path is printed on the console.

A sample output printed at the end of demo run:

[…redacted…]

INFO: Merging results…
INFO: The resulting table will have 30 rows and 21 columns (in 3 run sets).
INFO: Generating table…
INFO: Writing HTML into /path/to/artifact/results/demo.table.html …
INFO: done

When opening the generated HTML table, you will be guided to the Summary page of the experiment, where detailed settings of the experiment and a summary table of the compared tools/algorithms are displayed. If you open tab2.augmented-imc.summary.table.html, in this page you can see the number of proofs found by each compared approach as reported in Table 2.

To see the full table, please navigate to the tab Table. By filtering the status from the drop-down menus, you can see the results of Timeouts, Out of memory, and Other inconclusive of each compared approach as reported in Table 2.

To inspect the log file of an individual task, click on the status of that task. If the log file cannot be displayed, configure your browser according to the printed instructions.

To filter tasks, you can make use of the task filter at the upper-right corner of the page. To view quantile plots, please navigate to tab Quantile Plot and adjust the drop-down menus as you prefer. To view scatter plots, please navigate to tab Scatter Plot, and adjust the x- and y- axes according to your interests.

Known Issues of the Artifact

Known issues of this artifact are documented below.

CPU-throttling Warnings

When you perform the demo or full runs (especially on a laptop), BenchExec might raise the following warning:

2023-XX-XX XX:XX:XX - WARNING - CPU throttled itself during benchmarking due to overheating. Benchmark results are unreliable!

This is normal on a laptop. Please ignore it.

Complete Logs

The complete logs produced by each command mentioned above can be found in data-submission/complete-logs.html for reference.

Files

IMCDF-artifact-VMCAI24-submission.zip

Files (1.5 GB)

Name Size
md5:d5043a51f595f18ac771f137b2d0724f
1.5 GB Preview Download