Published March 15, 2024 | Version SPIN24-proceedings
Software Open

Reproduction Package for SPIN 2024 Article `Verification Witnesses Version 2'

Description

Abstract

This artifact is a reproduction package for the paper Software Verification Witnesses 2.0 which has been accepted at SPIN 2024.

It consists of all executables, input data and results required to reproduce the experiments and see the raw data from which the results presented in the paper were extracted. Specifically, it contains the tools Symbiotic, CPAchecker, UAutomizer, witness-lint and the input data sv-benchmarks as used in SV-COMP24.

The artifact is based on the SoSy-Lab Virtual Machine (Ubuntu 22.04 LTS) and has been configured such that no extra configuration is necessary and the artifact can run inside it without any problems.

By default, experiments are run with 2 cores and 15 GB of memory for at most 900 seconds. A full reproduction of the experiments requires roughly 8 months of CPU time. For demonstration purposes, a subset of tasks has been selected which should take at most 1 day to complete and will likely finish sooner. To test that everything is working as intended, all experiments can be run for two files which should finish all the experiments in around 30 minutes.

Contents

This artifact contains the following items:

  • README.md: This file.
  • *.xml: Mostly benchmark definition files for running the experiments using BenchExec.
  • config.sh: Config with parameters for BenchExec.
  • Makefile: Makefile to run the experiments.
  • setup.sh: Script to update the modified time of the CPAchecker binary as required by BenchExec and to add the required dependencies to the witness linter.
  • results.zip: Results of the as presented in the paper experiments. They have been zipped for convenience, since they are around 10 GB in size when uncompressed.
  • change_tasks.py: Script to change the tasks in the xml files from the full set to a subset to two files.
  • License.txt: License information for the artifact.
  • cache: Which contains a helper script to cache the downloading of tools and sv-benchmarks. Is not relevant for the artifact since everything is self-contained but may be relevant for a reproduction.
  • csv-generation.xml: XML file to generate the csv files from the results. It tells BenchExec what information to put into the csv files.
  • sv-benchmarks: The SV-COMP24 benchmarks.
  • Tools
    • CPAchecker: CPAchecker version 0af0e41240 in folder cpachecker .
    • Symbiotic: Symbiotic version 9c278f9 in folder val-symbiotic-witch .
    • Symbiotic Witch: Symbiotic Witch version svcomp24.
    • Witch: Witch version b011ec9 in folder symbiotic-witch using Witch Klee in version 6dabb94.
    • UAutomizer: UAutomizer version 0.2.4-?-8430d5a-m in folder uautomizer-graphml and 0.2.4-dev-0e0057c in folder uautomizer-yaml.
    • witness-lint: Witness-lint version 2.0.2 in folder witness_linter.
    • BenchExec: BenchExec version 19a85ac in folder benchexec.

This readme contains the following sections to help the user to reproduce the experiments:

  • TL;DR: A quick guide to reproduce the experiments and analyze the data.
  • Environment: Describes the environment in which the artifact was tested and can be run.
  • Experiments: Describes how to execute the experiments and generate the results.
  • Results: Describes where the results can be found and how to analyze them.
  • Known Issues: Describes known issues when executing the artifact.

TL;DR

  1. Run setup.sh to update the modified time of the CPAchecker binary as required by BenchExec.
  2. Change into the directory of the artifact i.e. software-verification-witnesses-2.0-artifact-SPIN24-final.
  3. Run source config.sh to set the environment variables.
  4. Run change_tasks.py with one of the following options to select what task set to run. Be aware that this will create new xml files and save the original ones in .xml.bkp files. Per default, all tasks are selected.
    • --all: Run all tasks.
    • --subset: Run a representative subset of tasks.
    • --single: Run a single task with an error and a single task which is correct i.e. one task for violation and one for correctness.
    • --clean: Restore the original xml files.
  5. Run make all-experiments to run all the experiments.
    • If desired only particular experiments can be run by running make experiment-*. The following naming convention has been used:
      • Experiments related to verification need to be run before the validation or witness analysis experiments.
      • Experiments containing correctness in the name are related to correctness witnesses and experiments containing violation are related to violation witnesses.
      • Experiments containing validate in the name are related to validation of the witnesses.
      • Experiments containing verification in the name are related to witness analysis.
      • Experiments containing lint in the name are related to analysis of witnesses.
  6. After the experiments are finished, run make generate-experiments-table-all to generate the HTML table with the results.
  7. Open the table and use different filters to validate the results presented in the paper.
    • A full explanation on how to validate the results in the paper will be explained in section Results.

Environment

To run the artifact BenchExec and Python >= 3.8 are required. For the particular tools, please see their respective documentation. A non-exhaustive list of rquired packages:

  • CPAchecker: Java 17
  • Symbiotic: Python 3.8, Clang 14, LLVM 14
  • UAutomizer: Java 11

Everything needed to run the artifact has been installed in the SoSy-Lab Virtual Machine (Ubuntu 22.04 LTS). It is recommended to run the experiments inside that VM, since no installation is necessary.

To set up the environment on the VM for executing the experiments, please follow these steps:

  • Execute setup.sh to update the modified time of the CPAchecker binary as required by BenchExec.
  • Change into the directory of the artifact i.e. software-verification-witnesses-2.0-artifact-SPIN24-final.
  • Run source config.sh to set the environment variables.

Note: From this point on it is assumed that the working directory is software-verification-witnesses-2.0-artifact-SPIN24-final inside the artifact.

Experiments

Running the Experiments

To evaluate how witnesses perform between version 1.0 and 2.0 21 different experiments are provided. The experiments are divided into verification, validation and witness analysis tasks and subdivided into violation and correctness tasks.

The provided Makefile documents how to reproduce the different experiments (Makefile targets starting with experiment-). Experiment targets related to verification are marked with verification, validation with validate and witness analysis with lint. Experiments related to correctness witnesses are marked with correctness and experiments related to violation witnesses are marked with violation. The experiments for witness version 1.0 are marked with graphml and for witness version 2.0 with yaml. Verification experiments already export both witness versions, so extra runs are not necessary. The following targets need to be run, with the following order:

  • Verification experiments
    • experiment-cpachecker-violation-verification
    • experiment-symbiotic-verification
    • experiment-cpachecker-correctness-verification
  • Validation experiments
    • Correctness
      • experiment-cpachecker-validate-cpachecker-correctness-graphml
      • experiment-cpachecker-validate-cpachecker-correctness-yaml
      • experiment-uautomizer-validate-correctness-graphml
      • experiment-uautomizer-validate-correctness-yaml
    • Violation
      • experiment-cpachecker-validate-cpachecker-violation-graphml
      • experiment-cpachecker-validate-cpachecker-violation-yaml
      • experiment-symbiotic-validate-cpachecker-graphml
      • experiment-symbiotic-validate-cpachecker-yaml
      • experiment-symbiotic-validate-symbiotic-graphml
      • experiment-symbiotic-validate-symbiotic-yaml
      • experiment-cpachecker-validate-symbiotic-graphml
      • experiment-cpachecker-validate-symbiotic-yaml
  • Witness analysis experiments
    • Correctness
      • experiment-lint-cpachecker-correctness-graphml
      • experiment-lint-cpachecker-correctness-yaml
    • Violation
      • experiment-lint-cpachecker-violation-graphml
      • experiment-lint-cpachecker-violation-yaml
      • experiment-lint-symbiotic-violation-graphml
      • experiment-lint-symbiotic-violation-yaml

The dependency between verification and validation of tasks has been made explicit as comments in the commands inside the Makefile.

Executing all experiments in the required order can be done by using:

make all-experiments

Experiments can be interrupted (with CTRL-c) at any moment to get the results for the currently completed tasks, this allows the user to continue the evaluation of the pipeline without having to wait for all the experiments to be finished.

Light Evaluation

To do a light evaluation of the artifact, a subset of tasks can be selected. This can be done by executing the script change_tasks.py, which modifies the xml files to only contain a subset of tasks.

Run change_tasks.py with one of the following options to select what task set to run. Be aware that this will create new xml files and save the original ones in .xml.bkp files. Per default all tasks are selected.

  • --all: Run all tasks. This will run all tasks and will take a long time to finish. It is required to reproduce the full results.
  • --subset: Run a subset of tasks. This subset provides an overview of the results and should finish in a reasonable amount of time.
  • --single: Run a single task with an error and a single task which is correct i.e. one task for violation and one for correctness. The purpose of this option is to test that everything is working as intended, in particular when using the artifact not in the recommended VM this will test that all requirements have been installed correctly.

Experiment Requirements

The experiments were run with 2 cores and 15 GB of memory and a Timeout of 900 seconds. The timeout is decreased to 90 secodns for the validation of violation witnesses. If so desired, these parameters can be adjusted in the *.xml files or in the config.sh file. Be aware that CPAchecker and UAutomizer require at least 2 Cores. Reasonable reduced resources are 2 cores and 8 GB of memory and 90 seconds timeout.

Results

Generating the Results

After the experiments are finished, the results can be generated by running the following command:

make generate-experiments-table-all

This will generate three HTML tables and csv file containing all the results of the experiments in the folder results-processed.

Results used for the Paper

The raw results which were used to obtain the numbers in the paper can be found in the results.zip file. Once unzipped this file will contain the following folders:

  • correctness: contains the results for all experiments related to correctness witnesses. This folder contains the raw data to answer RQ1 in the paper for correctness witnesses.
  • violation: contains the results for all experiments related to violation witnesses. This folder contains the raw data to answer RQ1 in the paper for violation witnesses.
  • witness-analysis: contains the results for all experiments related to analysis metrics of the witnesses, where they are divided between violation, correctness and witness analysis. This folder contains the raw data to answer RQ2 in the paper.
  • tables: contains the processed data which was used to generate the results in the paper. It contains to files:
    • {correctness,violation,witness-analysis}.html The tables are in HTML format and can be opened in a web browser.
    • {correctness,violation,witness-analysis}.csv The html table as a CSV file and can be opened in a spreadsheet program.

Each of the three tables {correctness,violation,witness-analysis}.html is responsible for part of the data.

  • correctness.html contains the results for the validation of correctness witnesses i.e. RQ1 for correctness witnesses.
  • violation.html contains the results for the validation of violation witnesses i.e. RQ1 for violation witnesses.
  • witness-analysis.html contains the results for the analysis of the witnesses i.e. RQ2.

Validate the Results in the Paper

Once you have a HTML table and a corresponding csv file with all the results, you can analyze the HTML table by opening it in a browser. Please be aware that due to the size of the table the browser may take a while to load and update the table.

For a full description on how to interact with the HTML table, please see the documentation inside BenchExec.

The plots and tables in the paper represent the data which is contained in the html tables in results/tables in an appropriate format for a paper. The HTML table is the ideal way to analyze the results, since it is interactive and allows for expressing complex filters. In particular having the data in the paper validating it is easy using corresponding filters.

For each of the corresponding tables and plots in the paper, the filters will be given. There are two ways to do this, by appending something to the URL of the table i.e. changing *.table.html in the browser URL to *.table.html#FILTER or by selecting the filter manually inside the table. The Summary tab of the table contains a summary of the results, the tab Quantile Plot contains a quantile plot and the tab Table contains the raw data in table form. Hovering over the header in the Table tab will show what file this colum of the table corresponds to.

Reproduce the Results for RQ1

Correctness Witnesses

To do this analysis please use the table correctness.html in the results/tables folder.

To validate the results for RQ1 related to correctness witnesses, you need to filter the column containing the results of the experiment experiment-cpachecker-correctness-verification usually called cpachecker-correctness-verification.* in the tab Table for all tasks which were correctly solved. This can be done by selecting status and filtering by correct under category. This will give you the number of tasks which were correctly solved by CPAchecker for tasks which are expected to not reach the location marked by reach_error. The filter for the table is given by #/?filter=0(0*status*(category(in(correct))))

Returning to the Summary tab, you can see the numbers of Table 4 by looking at the results for the columns {cpachecker,uautomizer}-validate-cpachecker-correctness-{graphml,yaml}.* for CPAchecker and UAutomizer as validators for witness versions 1.0 and 2.0. Results marked as true correspond to confirmed witnesses and results marked as false correspond to refuted witnesses.

Changing to the Quantile Plot tab and selecting cputime you will be able to see the quantile plots for the results of the validation of the correctness ywitnesses, which correspond to Figure 3 in the paper.

Violation Witnesses

To do this analysis please use the table violation.html in the results/tables folder.

For violation witnesses, it is the same as for correctness witnesses, but since there are two verifiers, both need to be considered. The filter also needs to be done for tasks with an expected verdict of false instead of true. The columns to look at are cpachecker-violation-verification.* and symbiotic-verification.* for CPAchecker and Symbiotic as Verifiers respectively.

When filtering for the tasks CPAchecker solved correctly the relevant columns in the tab Summary are {cpachecker,symbiotic}-validate-cpachecker-violation-{graphml,yaml}.* for the different altenatives. These numbers correspond to the ones in Table 5 in the Paper for CPAchecker as a Verifier. In the tab Quantile Plot you will see the results of Figure 5.a. For CPAchecker as a verifier the filter is given by: #/?filter=2(0*status*(category(in(correct)))).

When filtering for Symbiotic as a Verifier the relevant columns are {cpachecker,symbiotic}-validate-symbiotic-violation-{graphml,yaml}.* for Table 5. The tab Quantile Plot contains the results of Figure 5.b. For Symbiotic as a verifier the filter is given by: #/table?filter=9(0*status*(category(in(correct)))).

Validate the Results for RQ2

For this analysis please use the table witness-analysis.html in the results/tables folder.

To obtain the numbers for the Tables related to RQ2 and not only validate them, a small analysis needs to be run on the csv file with the same name as the html table, this is not part of the artifact.

Correctness Witnesses

To validate the results in Table 6 you can use the Table tab to set different amount for the different metrics of the witnesses. There is field below each metric were a range can be given. This will allow you to see that the numbers correspond to the minimum, median and maximum. Since inputting the range :mminimum will leave only a single minimum value, the range maximum: will leave only a single maximum value and the range median: will divide the dataset into two halves, which can be seen in the Summary tab.

In contrast to the results for validation, no filtering is required, since all witnesses exported by CPAchecker are used. Be aware that CPAchecker exports a correctness witness even if the task timeouts, in order to show the user what the state of the current analysis is. The numbers presented here are based on all correctness witnesses exported by CPAchecker and not only those for tasks which were solved correctly.

The relevant columns for this analysis are lint-correctness-witnesses-{graphml,yaml}.* for witnesses version 1.0 and 2.0 respectively.

Violation Witnesses

To validate the results in Table 7 the same procedure as for correctness witnesses can be used. The only difference is that the numbers are based on violation witnesses and not correctness witnesses.

The relevant columns for this analysis are lint-violation-witnesses-{cpachecker,symbiotic}-{graphml,yaml}.* for witnesses produce by CPAchecker/Symbiotic in version 1.0 and 2.0 respectively.

Known Issues

Warnings

The benchmark files *.xml are the ones used in SV-COMP 2024 and therefore contain the tasks for all properties. The configuration file config.sh filters them out such that only the reachability tasks are used. This is done after BenchExec has started processing the files, which is the cause of most warnings.

During the execution of the experiments, some warnings may appear. For example:

2024-XX-XX XX:XX:XX - WARNING - No files found matching 'sv-benchmarks/c/SoftwareSystems-uthash-MemCleanup.set'.
2024-XX-XX XX:XX:XX - WARNING - No files found matching 'memsafety-broom/*.yml'.
2024-XX-XX XX:XX:XX - WARNING - No files found matching 'sv-benchmarks/c/SoftwareSystems-DeviceDriversLinux64-Termination.set'.
2024-XX-XX XX:XX:XX - WARNING - CPU throttled itself during benchmarking due to overheating. Benchmark results are unreliable!

Both warnings can be ignored. The first one, since those files are not related to the property being analyzed, which is unreach-call and therefore do not affect the reproduction of the results. The second one, may affect the results, but should not happen when reproducing the results outside a VM.

The following warning can also be safely ignored, since it only says that no witness was found for the task. This is expected for tasks which could not be verified correctly or tasks which were not used during verification, for example, for other properties than reachability of an error function:

2024-XX-XX XX:XX:XX - WARNING - Pattern ... in required tag did not match any file for task ...

Errors

When BenchExec tries to create an overlay mount while there is a shared folder mounted inside the VM, the following error may occur:

2024-XX-XX XX:XX:XX - ERROR - Failed to create overlay mount for /home/...: invalid argument ...

In this case, the shared folder should be unmounted and the experiments should be run again.

Files

witnesses-2.0-artifact-SPIN24-proceedings.zip

Files (4.0 GB)

Name Size Download all
md5:9c559fee4251515a7c08bf64963ec287
4.0 GB Preview Download

Additional details

Dates

Submitted
2024-03-16
SPIN 2024 (proceedings paper)