Published June 10, 2024 | Version FSE24-proceedings
Software Open

Reproduction Package for FSE 2024 Article `Decomposing Software Verification Using Distributed Summary Synthesis'

Description

Distributed Summary Synthesis

We distribute the verification of a single task by dividing the task into smaller verification tasks. We communicate new pre- and violation conditions through messages.

  • VM username: vagrant
  • VM password: vagrant

System Requirements

The artifact requires 8 CPU cores and 16 GB of RAM. Additionally, we require 15 GB of empty disk space. The VM was tested on Ubuntu 22.04 with Virtual Box Version 7.0.10 r158379 (Qt5.15.3).

Reproduction

Reproduce the Example in the Paper

Navigate to ~/DSS and execute ./example.sh test/programs/block_analysis/abstraction_safe.c.

CPAchecker will decompose the example program in blocks and verify the program using DSS. The example program is located at ~/DSS/cpachecker/test/programs/block_analysis/abstraction_safe.c After DSS finished, an HTML page containing the block graph and a table of messages automatically appears. We use a simplified version of the block graph in our paper, however, the idea and the verification works as described in the paper. In case, it does not appear automatically, execute open ~/DSS/cpachecker/output/block_analysis/visualized/report.html.

Red messages represent violation conditions (ERROR_CONDITION). Yellow messages represent preconditions (BLOCK_POSTCONDITION). The column of the message indicates which block sent the message. On the far left, the passed time in nanoseconds since the start of the execution is displayed.

We observe that blocks “L1” and “L2” send the summary “x = y”. No new violation conditions emerge, therefore DSS finds a proof (FOUND_RESULT in green).

We expect the last messages of “L1” and “L2” in the automatically opened browser tab to look like this:

↓ React to message from <SNIP> (ID: <SNIP>):

Calculated new BLOCK_POSTCONDITION message for <SNIP>

{"readable":"(`=_T(18)` main::x@1 main::y@1)"}

The last lines in the terminal should look like this:

<--SNIP-->
Starting analysis ... (CPAchecker.runAlgorithm, INFO)

Starting block analysis... (BlockSummaryAnalysis.run, INFO)

Decomposed CFA in 6 blocks using the MERGE_DECOMPOSITION. (BlockSummaryAnalysis.run, INFO)

Block analysis finished. (BlockSummaryAnalysis.run, INFO)

Stopping analysis ... (CPAchecker.runAlgorithm, INFO)

Verification result: TRUE. No property violation found by chosen configuration.
More details about the verification run can be found in the directory "./output".
Graphical representation included in the file "./output/Report.html".

Script ~/DSS/more_examples.sh runs DSS on more examples with subsequent visual output. DSS should finish verification of programs with the postfix "_safe" in the filename with

Verification result: TRUE. No property violation found by chosen configuration.

and with

Verification result: FALSE. Property violation found by chosen configuration.
More details about the verification run can be found in the directory "./output".

if the postfix is "_unsafe".

Reproduce the Plots in the Paper

Navigate to ~/DSS and execute ./reproduce-plots.sh.

The script runs ./evaluation.py paper-csvs on our original data in paper-csvs. A short while later, the directory ~/DSS/plots should open automatically. It contains all reproduced plots. The filenames equal the figure/table numbers in the paper.

For a fast comparison, we copied the original plots to ~/DSS/paper-plots. Note that rerunning the script, forcefully removes the directory ~/DSS/plots before reproducing them again.

We expect the following output in the terminal:

=======Loading CSVs=======
paper-csvs/forward2.csv has 2549 results for SoftwareSystems.
paper-csvs/dcpa4.csv has 2549 results for SoftwareSystems.
paper-csvs/dcpa8.csv has 2549 results for SoftwareSystems.
paper-csvs/kind.csv has 2549 results for SoftwareSystems.
paper-csvs/dcpa2.csv has 2549 results for SoftwareSystems.
paper-csvs/dcpa1.csv has 2549 results for SoftwareSystems.
paper-csvs/imc.csv has 2549 results for SoftwareSystems.
==========================

Task with most threads (unsolved): 751
Task with most threads (solved): 476 

===Analysis of Overhead===
dcpa8backward analysis time (s)    45.719659
dcpa8decomposition time (s)         0.078870
dcpa8deserialization time (s)      53.193505
dcpa8forward analysis time (s)     21.183128
dcpa8instrumentation time (s)       0.081529
dcpa8proceed time (s)               6.998556
dcpa8serialization time (s)         1.064093
dcpa8cputime (s)                   88.211899
overhead                           54.417997
dtype: float64

Data in sections 'Communication Model' and 'Choice of Decomposition':
Decomposition takes 0.18183334764521225 % of overall time
Packing takes 1.2062918037371835 % of overall time
Unpacking takes 60.30196127528197 % of overall time
==========================

Removed 64 tasks because they contain unsupported features.

Max. speed-up in parallel portfolio 15.912337391444877 

=======Unique tasks=======
ERROR                58
TIMEOUT               2
ERROR (recursion)     1
Name: imcstatus, dtype: int64
DCPA solved 61 tasks uniquely, compared to IMC
ERROR (recursion)    5
TIMEOUT              2
Name: kindstatus, dtype: int64
DCPA solved 7 tasks uniquely, compared to k-Induction
TIMEOUT              9
ERROR (recursion)    4
OUT OF MEMORY        1
Name: forward2status, dtype: int64
DCPA solved 14 tasks uniquely, compared to predicate analysis
==========================
Plots are reproduced and named accordingly

Note that 2549 - 64 = 2485 equals the number of benchmark tasks we describe in the paper.

Reproduce the Experiments

Navigate to ~/DSS and execute ./reproduce-all.sh.

After a month of computations, the directory containing all plots named according to the number of the respective figure/table in the paper should appear automatically. We store the plots in ~/DSS/plots.

Benchexec stores the raw data of the benchmarks in ~/DSS/cpachecker/test/results. The CSVs of the raw data are stored in ~/DSS/csvs-reproduced. To reproduce the plots with the newly obtained raw data without executing the benchmarks again, run ./evaluation.csv csvs-reproduced from ~/DSS.

ATTENTION: Rerunning ./reproduce-all.sh removes all results in ~/DSS/csvs-reproduced and ~/DSS/cpachecker/test/results and the progress will be lost.

30 seconds after executing the shell script the output in the terminal should be similar to:

vagrant@vagrant:~/DSS$ ./reproduce-all.sh 
2024-04-30 22:21:02 - WARNING - Ignoring specified resource requirements in local-execution mode, only resource limits are used.
2024-04-30 22:21:02 - INFO - Unable to find pqos_wrapper, please install it for cache allocation and monitoring if your CPU supports Intel RDT (cf. https://gitlab.com/sosy-lab/software/pqos-wrapper).

executing run set 'DCPA1'     (2983 files)
2024-04-30 22:21:02 - INFO - LXCFS is not available, some host information like the uptime leaks into the container.
22:21:02   aws-c-common/aws_add_size_checked_harness.yml                                                                                                                                      EXCEPTION                   18.74   18.90
<--SNIP-->               

Reproduce a Subset

Navigate to ~/DSS and execute ./reproduce-selection.sh.

The script runs our full pipeline of our evaluation on a small subset of 11 tasks. The small selection should illustrate that, in general, the work distributes better when more cores are available and that there are tasks that DSS solves faster than standard predicate analysis (and vice-versa). Additionally, it creates all plots using the newly obtained data.

After approximately 30 minutes, the directory containing all plots appear automatically. The plots are named after the figure/table number in the paper. We store the plots in ~/DSS/plots.

The selection contains one unsafe task (a task where function reach_error is indeed reachable) to illustrate DSS’s capability of finding violations. However, as stated in the paper, the evaluation focuses on safe programs. Therefore, the unsafe tasks does not appear in the generated plots/tables.

Benchexec stores the raw data of the benchmarks in ~/DSS/cpachecker/test/results. The CSVs of the raw data are stored in ~/DSS/csvs-reproduced. To reproduce the plots with the newly obtained raw data without executing the benchmarks again, run ./evaluation.csv csvs-selected from ~/DSS.

ATTENTION: Rerunning ./reproduce-selection.sh removes all results in ~/DSS/csvs-selected and ~/DSS/cpachecker/test/results and the progress will be lost.

90 seconds after executing the shell script the output in the terminal should be similar to:

vagrant@vagrant:~/DSS$ ./reproduce-selection.sh 
2024-04-30 21:59:46 - WARNING - Ignoring specified resource requirements in local-execution mode, only resource limits are used.
2024-04-30 21:59:46 - INFO - Unable to find pqos_wrapper, please install it for cache allocation and monitoring if your CPU supports Intel RDT (cf. https://gitlab.com/sosy-lab/software/pqos-wrapper).

executing run set 'DCPA1.ReachSafety-Selection'     (11 files)
2024-04-30 21:59:46 - INFO - LXCFS is not available, some host information like the uptime leaks into the container.
21:59:46   ldv-linux-3.4-simple/43_1a_cilled_ok_nondet_linux-43_1a-drivers--char--tpm--tpm_nsc.ko-ldv_main0_sequence_infinite_withcheck_stateful.cil.out.yml                            true                        36.69   37.13
22:00:23   ldv-linux-3.4-simple/43_1a_cilled_ok_nondet_linux-43_1a-drivers--watchdog--sch311x_wdt.ko-ldv_main0_sequence_infinite_withcheck_stateful.cil.out.yml                         true                        42.26   43.08
22:01:07   ldv-linux-3.4-simple/43_1a_cilled_ok_nondet_linux-43_1a-drivers--watchdog--it8712f_wdt.ko-ldv_main0_sequence_infinite_withcheck_stateful.cil.out.yml                         true
<--SNIP-->               

Interpret the Data

Benchexec

Benchexec produces XML results files. They keep track of the used memory, the consumed response time (walltime) and the consumed CPU time per task. Since the data is barely readable for humans, we create CSV files in ~/DSS/csvs. The data is organized in a nice table where the columns list the consumed resources per task (row).

Plots

The plots as described in the paper are stored in ~/DSS/paper-plots or can be found in ~/DSS/plots after running a reproduction script.

Artifact Structure

All tools and data are stored in ~/DSS. The VM comes with pre-installed and pre-configured benchexec 3.16. CPAchecker is installed in ~/DSS/cpachecker. The SV-COMP benchmark set is cloned into ~/DSS/sv-benchmarks. In the beginning, ~/DSS contains the following files:

  • ~/DSS/evaluation.py: Python script to reproduce the plots.
  • ~/DSS/example.sh: Shell script to reproduce the example in the paper with CPAchecker
  • ~/DSS/LICENSE: LICENSE of the artifact.
  • ~/DSS/ReadMe.md: This file.
  • ~/DSS/removed.txt: 64 excluded tasks from the benchmark set due to unsupported features.
  • ~/DSS/reproduce-all.sh: Script to reproduce the full evaluation.
  • ~/DSS/reproduce-selection.sh: Script to reproduce parts of our evaluation.
  • ~/DSS/reproduce-plots.sh: Script to reproduce all plots on the raw data of our runs.
  • ~/DSS/requirements.md: Lists all requirements and shows what which commands we executed to create this VM.
  • ~/DSS/requirements.txt: List of required and already installed python packages.
  • ~/DSS/software-systems.csv: All tasks belonging to the software-systems category of SV-benchmarks.
  • ~/DSS/INSTALL.md: Instructions for installing CPAchecker and how to check if the VM works as intended.
  • ~/DSS/more_examples.sh: A small collection of examples including visual representation after calculation finished.

Files

DSS-artifact-FSE24-proceedings.zip

Files (8.8 GB)

Name Size Download all
md5:96966e1d8c9c0251bed8dbcf024724d7
8.8 GB Preview Download