Published February 21, 2020 | Version tacas20
Software Open

Replication Artifact for Article 'Software Verification with PDR: An Implementation of the State of the Art'

  • 1. LMU Munich

Description

This is a replication package for the article "Software Verification with PDR: An Implementation of the State of the Art" by Dirk Beyer and Matthias Dangl.

 

# Tools

## CPAchecker
CPAchecker in revision 27742 from the trunk can be found in
  ./CPAchecker

## Vvt
Vvt can be obtained from the maintainers at FORSYTE via the following URL:
  http://vvt.forsyte.at/releases/vvt-svcomp.tar.xz
We have included this version of Vvt in this artifact.
This is the version that was used in SVCOMP'16;
the latest version is available via GitHub, but performs worse on our
benchmarks due to bugs:
  https://github.com/hguenther/vvt

## SeaHorn
SeaHorn can be obtained via GitHub at the following URL:
  https://github.com/seahorn/seahorn
We used SeaHorn in version F16-0.1.0-rc3.
We have included this version of SeaHorn in this artifact.

## VeriAbs
VeriAbs is not included in our artifact, because we used the publicly
available results of SV-COMP'19 instead of executing it ourselves.

## BenchExec
For our experiments, we used BenchExec in revision a9202161fde1274056376814c33e8b2960b0e812 from GitHub:
  https://github.com/sosy-lab/benchexec/
At the time of writing the article, this was the latest version.
We could not use an official release, because we required a specific bug-fix
for the tool-info module of Vvt.
We have included this version of BenchExec in this artifact.

# Benchmarks

## SVCOMP benchmark set
The SVCOMP benchmark set is available via GitHub:
  https://github.com/sosy-lab/sv-benchmarks
We used the svcomp18 release of the benchmark set,
except for Table 8, where we used the svcomp19 release.
Note that all of test-set definitions we use depend on the SVCOMP benchmark
set; either completely (tasks and specification) or at least for the
specification.
In the general layout of this supplementary archive,
it is expected to be checked out as:
  ./sv-benchmarks/
We have already prepared this in the layout of this artifact.

## Hand-crafted tasks
The files in
  ./hand-crafted/
contain the hand-crafted examples discussed in the article.

# Running the experiments

## Note
Since running all tools and configurations on the whole benchmark set would
take several months using a single machine instead of distributing execution
over a computing cluster as we did for our experiments,
we recommend executing the tools on selected tasks only.
If you want to run one or several configurations on all benchmark tasks
anyway, you can use the provided benchmark definitions for BenchExec, which is
part of this artifact, including its documentation.
As a reduced benchmark set, we recommend the set used for Table 2 of our
article, which corresponds to the benchmark definition file ./CPAchecker/test/test-sets/pdr-inv-handcrafted.xml.

## Preparing the execution environment
Before running BenchExec, ensure that cgroups are configured correctly (not
all of the following settings may be available on your machine):
  sudo chmod o+wt '/sys/fs/cgroup/cpuset/'
  sudo chmod o+wt '/sys/fs/cgroup/cpu,cpuacct/'
  sudo chmod o+wt '/sys/fs/cgroup/memory/'
  sudo chmod o+wt '/sys/fs/cgroup/cpu,cpuacct/user.slice'
  sudo chmod o+wt '/sys/fs/cgroup/freezer/'
  sudo chmod o+wt '/sys/fs/cgroup/memory/user.slice/user-1000.slice/session-1.scope'
You should also disable swap to ensure proper resource accounting if your system uses swap:
  sudo swapoff -a

## Resources
Our experiments require at least 15 GB of RAM and two CPU cores.
If you cannot provide such an environment (e.g. by adjusting the settings of a VM),
you can edit the benchmark definitions to scale down the requirements.
The benchmark definitions are the XML files used in the following section.
In the header of each file, you can adjust the resource requirements.
For example, you can change
  (...) memlimit="15 GB" cpuCores="2" (...)
to
  (...) memlimit="1500 MB" cpuCores="1" (...)
Please note that changing the requirements may vastly influence the benchmark results.

## Running CPAchecker
To run the CPAchecker experiments,
  cd ./CPAchecker
and execute
  PYTHONPATH=../benchexec/ PATH=../benchexec/bin/:PATH../benchmark.py−−no−container../CPAchecker/test/test−sets/∗.xmlToonlyruntheexperimentsforthesetofhand−craftedtasks(recommended),runPYTHONPATH=../benchexec/PATH=../benchexec/bin/:PATH ../benchmark.py --no-container ../CPAchecker/test/test-sets/pdr-inv-handcrafted.xml

## Running Vvt
To run the Vvt experiments,
  cd ./vvt/bin
and execute
  PYTHONPATH=../../benchexec/ PATH=../../benchexec/bin/:$PATH ../../benchmark.py --no-container ../../vvt/test/test-sets/vvt.xml

## Running SeaHorn
To run the SeaHorn experiments,
  cd ./seahorn/bin
and execute
  PYTHONPATH=../../benchexec/ PATH=../../benchexec/bin/:$PATH ../../benchmark.py --no-container ../../seahorn/test/test-sets/seahorn.xml

# Tested execution environment:
For our experiments, we executed the chosen software verifiers on machines
with one 3.4 GHz CPU (Intel Xeon E3-1230 v5) with 8 processing units and
33 GB of RAM each. The operating system was Ubuntu 16.04 (64 bit), using
Linux 4.4 and OpenJDK 1.8. We limited each verification run to two CPU cores,
a CPU run time of 15 min, and a memory usage of 15 GB. To ensure reliable and
accurate measurements, we used the benchmarking framework BenchExec
to conduct our experiments.

# Reproducing figures:
The BenchExec documentation also gives examples how to turn benchmark results
into LaTeX plots, which is how we generated our figures.

 

Files

artifact.zip

Files (1.2 GB)

Name Size Download all
md5:72c05e943e8569a5c3236702d2d6c710
1.2 GB Preview Download

Additional details

Related works

Is supplement to
Conference paper: 10.1007/978-3-030-45190-5_1 (DOI)