There is a newer version of the record available.

Published January 9, 2026 | Version fase26-artifact-evaluation
Software Open

Artifact for the FASE 26 Paper: Testing in Formal Verification via Witness Generation (Empirical Evaluation)

  • 1. ROR icon Ludwig-Maximilians-Universität München
  • 2. LMU Munich

Description

Artifact for the FASE 26 Paper

Testing in Formal Verification via Witness Generation (Empirical Evaluation)

This artifact guides you through reproducing the plots and tables presented in the paper and through rerunning the experimental evaluation.

The artifact is shipped as a VirtualBox VM. - The artifact is contained within a Ubuntu 24.04 VM. - We require at least 16 GB of Memory and 4 Cores

Getting Started

Username and Password
user: test-to-witness
password: fase26

This artifact contains:

  • analysis/ : Files concerning plots
  • backup/ : A backup of the experiment data
  • benchexec/ : The benchmarking tool
  • benchmark-defs/ : Files defining benchmarks
  • benchmark-sets/ : Files describing sets of tasks
  • combinations/ : Code of the portfolio
  • data/ : data from SV-COMP 25
  • expected/ : The expected plots
  • Makefile
  • results/ : Here are the generated witnesses from test-to-witness
  • results-baseline/ : results of the baseline (just the verifier) for the portfolios
  • results-portfolio/ : results of the portfolio experiments
  • results-validate/ : results of the validation of generated witnesses
  • results-verified/ : Data from Test-Comp 25
  • sv-benchmarks/ : The sv-benchmark tasks
  • table-defs/ : Files defining how to create csv tables from the raw experiment data
  • test-to-witness/ : The tool test-to-witness
  • tool_info/ : Files telling benchexec how to run our tools
  • validators/ : The validators used in the experiment

If you are interested in just the tool Test-To-Witness you may navigate into the test-to-witness folder. There we include a README on how to use test-to-witness. Please note: We did not install a golang distribution in the VM, thus it is not possible to recompile test-to-witness. For the most up-to-date version we refer readers to our public gitlab: https://gitlab.com/sosy-lab/software/test-to-witness.

This artifact document consists of the following steps:

  1. Downloading and installing the VM
  2. Recreating the plots and tables from the paper
  3. Running a subset of the experiments
  4. Running the full set of experiments

For Getting Started we recommend doing Sections 1, 2 and 3.1 these should be able to be completed in less than 30min.

Sections 2, 3 and 4 also serve as the Step-by-Step guide.

1. Download and Installation

Instructions

  1. Download the artifact VM from Zenodo using the DOI provided with the paper. The download consists of a single VirtualBox appliance file with the extension .ova.

  2. Install Oracle VirtualBox (version 7 or newer is recommended) from the official website.

  3. Start VirtualBox and select File → Import Appliance.

  4. Choose the downloaded .ova file and follow the import wizard. You may adjust the VM settings during import; we require allocating at least 16 GB of RAM and 4 CPU cores.

  5. After the import finishes, select the VM from the list and click Start to launch it.

  6. Log in to the VM using the credentials: user: test-to-witness pw: fase26

Inside the VM

To keep the VM as small as possible to Download, we compressed the data inside the VM. Before starting the evaluation please decompress the data. This will take approximately 15-30min. The resulting size is approximately 49Gb.

pigz -dc fase26.tar.gz | pv | tar xf -

2. Recreating the Plots and Tables of the Paper

To recreate the plots, first open a terminal and change into the directory containing all data and scripts:

cd $HOME/fase26

To generate all plots and tables used in the paper, run the following command (this takes approximately 1–2 minutes):

make plots

Expected Output:

Copying data for plots...
mkdir -p analysis/raw/
cp table-defs/portfolio-cpachecker.table.csv analysis/raw/
cp table-defs/portfolio-esbmc-kind.table.csv analysis/raw/
cp table-defs/portfolio-uautomizer.table.csv analysis/raw/
cp table-defs/portfolio-symbiotic.table.csv analysis/raw/
cp table-defs/portfolio-bubaak.table.csv analysis/raw/
cp table-defs/portfolio-cpv.table.csv analysis/raw/
cp data/reach_safety.table.csv analysis/raw/
cp results-verified/META_Cover-Error.table.csv analysis/raw/
cp table-defs/creation.table.csv analysis/raw/
cp table-defs/validation.table.csv analysis/raw/
Copied data successfully.
Creating plots...
cd analysis && python3 plot.py
RQ1: Converted tasks (done): 10897
< 1.0s: 10490/10897 (96.27%)
< 0.2s: 8069/10897 (74.05%)
Created plots successfully.
Plots are in the analysis/final_plots/ directory

The generated plots are located in analysis/final_plots/.
For convenience, we also provide the expected plots and tables in the directory expected/ so that you can directly compare results.

3. Running a Subset of the Experiments

Because the full experiments are time-consuming, we provide a smoke-test mode that executes a small but representative subset of the experiments.

There are three experiments in the paper:

  1. Creating witnesses from test suites

  2. Validating the generated witnesses

  3. Running parallel verifier–test-generator portfolios

Backup of Results

The experiments will write to the same files that the original experiments conducted by the authors wrote to. The benchmarking utility benchexec will terminate with an error if one of the old experiemt files is still present. In the following steps, we manually move result directories to make way for the new set of experiments about to be run. An additional backup of all original results is stored in backups/backup.tar.gz. It may be restored with make restore-experiment-data

3.1 RQ1: Creating Witnesses from Test Suites

To generate witnesses from test cases, execute:

cd $HOME/fase26
mv results results.bak
make small-witnesses-from-testsuites

Expected Output:

PYTHONPATH=tool_info benchexec/bin/benchexec --read-only-dir=/ --hidden-dir /tmp --overlay-dir=/home --no-compress-results -o results/ --tool-directory test-to-witness --startTime '2025-10-16 10:29:27' benchmark-defs/small/test2witness.xml
2026-01-08 14:09:59 - WARNING - No propertyfile specified. Score computation will ignore the results.
2026-01-08 14:09:59 - INFO - Unable to find pqos_wrapper, please install it for cache allocation and monitoring if your CPU supports Intel RDT (cf. https://gitlab.com/sosy-lab/software/pqos-wrapper).

executing run set 'esbmc-incr-coverage-error-call-confirmed.esbmc-incr-test-suites'     (1 file)
14:09:59   06_fuzzle_50x50_0-cycle.yml    done                         2.30    2.67

executing run set 'fusebmc-coverage-error-call-confirmed.fusebmc-test-suites'     (1 file)
14:10:02   array_ptr_single_elem_init-2.yml    done                         5.64    3.54

executing run set 'hybridtiger-coverage-error-call-confirmed.hybridtiger-test-suites'     (1 file)
14:10:06   btor2c-lazyMod.driving_phils.3.prop1-func-interl.yml    done                         1.33    1.72

executing run set 'kleef-coverage-error-call-confirmed.kleef-test-suites'     (1 file)
14:10:08   s4liff.yml              done                         0.26    0.49

executing run set 'owic-coverage-error-call-confirmed.owic-test-suites'     (1 file)
14:10:08   elevator_spec3_product23.cil.yml    done                         0.33    0.57

executing run set 'tracerx-wp-coverage-error-call-confirmed.tracerx-wp-test-suites'     (1 file)
14:10:09   list-2.yml              done                       175.25  167.49

executing run set 'utestgen-coverage-error-call-confirmed.utestgen-test-suites'     (1 file)
14:12:57   rangesum.yml            done                         0.26    0.44

executing run set 'wasp-c-coverage-error-call-confirmed.wasp-c-test-suites'     (1 file)
14:12:58   brs5f.yml               done                         0.21    0.70

Statistics:              8 Files
  correct:               0
    correct true:        0
    correct false:       0
  incorrect:             0
    incorrect true:      0
    incorrect false:     0
  unknown:               0

In order to get HTML and CSV tables, run
benchexec/bin/table-generator results/test2witness.2025-10-16_10-29-27.results.esbmc-incr-coverage-error-call-confirmed.esbmc-incr-test-suites.xml results/test2witness.2025-10-16_10-29-27.results.fusebmc-coverage-error-call-confirmed.fusebmc-test-suites.xml results/test2witness.2025-10-16_10-29-27.results.hybridtiger-coverage-error-call-confirmed.hybridtiger-test-suites.xml results/test2witness.2025-10-16_10-29-27.results.kleef-coverage-error-call-confirmed.kleef-test-suites.xml results/test2witness.2025-10-16_10-29-27.results.owic-coverage-error-call-confirmed.owic-test-suites.xml results/test2witness.2025-10-16_10-29-27.results.tracerx-wp-coverage-error-call-confirmed.tracerx-wp-test-suites.xml results/test2witness.2025-10-16_10-29-27.results.utestgen-coverage-error-call-confirmed.utestgen-test-suites.xml results/test2witness.2025-10-16_10-29-27.results.wasp-c-coverage-error-call-confirmed.wasp-c-test-suites.xml

This command generates witnesses from test cases produced by 8 different test case generators and representing the different runtimes a conversion can take on.

We recreate the creation.table.csv used by the plot script with our newly created data:

make -B table-defs/creation.table.csv

Expected Output:

Generating creation.table.csv...
benchexec/bin/table-generator -x table-defs/creation.xml -f csv --no-diff -o table-defs/
INFO: Reading table definition from 'table-defs/creation.xml'...
WARNING: No file matches 'table-defs/../results/test2witness.2025-10-16_10-29-27.results.cetfuzz-coverage-error-call-confirmed.cetfuzz-test-suites.xml.bz2'.
INFO:     table-defs/../results/test2witness.2025-10-16_10-29-27.results.esbmc-incr-coverage-error-call-confirmed.esbmc-incr-test-suites.xml.bz2
WARNING: No file matches 'table-defs/../results/test2witness.2025-10-16_10-29-27.results.esbmc-kind-coverage-error-call-confirmed.esbmc-kind-test-suites.xml.bz2'.
WARNING: No file matches 'table-defs/../results/test2witness.2025-10-16_10-29-27.results.fdse-coverage-error-call-confirmed.fdse-test-suites.xml.bz2'.
WARNING: No file matches 'table-defs/../results/test2witness.2025-10-16_10-29-27.results.fizzer-coverage-error-call-confirmed.fizzer-test-suites.xml.bz2'.
INFO:     table-defs/../results/test2witness.2025-10-16_10-29-27.results.fusebmc-coverage-error-call-confirmed.fusebmc-test-suites.xml.bz2
WARNING: No file matches 'table-defs/../results/test2witness.2025-10-16_10-29-27.results.fusebmc-ia-coverage-error-call-confirmed.fusebmc-ia-test-suites.xml.bz2'.
INFO:     table-defs/../results/test2witness.2025-10-16_10-29-27.results.hybridtiger-coverage-error-call-confirmed.hybridtiger-test-suites.xml.bz2
WARNING: No file matches 'table-defs/../results/test2witness.2025-10-16_10-29-27.results.klee-coverage-error-call-confirmed.klee-test-suites.xml.bz2'.
INFO:     table-defs/../results/test2witness.2025-10-16_10-29-27.results.kleef-coverage-error-call-confirmed.kleef-test-suites.xml.bz2
WARNING: No file matches 'table-defs/../results/test2witness.2025-10-16_10-29-27.results.legion-coverage-error-call-confirmed.legion-test-suites.xml.bz2'.
WARNING: No file matches 'table-defs/../results/test2witness.2025-10-16_10-29-27.results.legion-symcc-coverage-error-call-confirmed.legion-symcc-test-suites.xml.bz2'.
INFO:     table-defs/../results/test2witness.2025-10-16_10-29-27.results.owic-coverage-error-call-confirmed.owic-test-suites.xml.bz2
WARNING: No file matches 'table-defs/../results/test2witness.2025-10-16_10-29-27.results.prtest-coverage-error-call-confirmed.prtest-test-suites.xml.bz2'.
WARNING: No file matches 'table-defs/../results/test2witness.2025-10-16_10-29-27.results.rizzer-coverage-error-call-confirmed.rizzer-test-suites.xml.bz2'.
WARNING: No file matches 'table-defs/../results/test2witness.2025-10-16_10-29-27.results.symbiotic-coverage-error-call-confirmed.symbiotic-test-suites.xml.bz2'.
WARNING: No file matches 'table-defs/../results/test2witness.2025-10-16_10-29-27.results.tracerx-coverage-error-call-confirmed.tracerx-test-suites.xml.bz2'.
INFO:     table-defs/../results/test2witness.2025-10-16_10-29-27.results.tracerx-wp-coverage-error-call-confirmed.tracerx-wp-test-suites.xml.bz2
INFO:     table-defs/../results/test2witness.2025-10-16_10-29-27.results.utestgen-coverage-error-call-confirmed.utestgen-test-suites.xml.bz2
INFO:     table-defs/../results/test2witness.2025-10-16_10-29-27.results.wasp-c-coverage-error-call-confirmed.wasp-c-test-suites.xml.bz2
WARNING: Missing module ".testtowitness", cannot extract values from log files (ImportError: No module named '.testtowitness').
INFO: Merging results...
INFO: The resulting table will have 8 rows and 3 columns (in 1 run sets).
INFO: Generating table...
INFO: Writing CSV  into table-defs/creation.table.csv ...
INFO: done
Generated creation.table.csv successfully

After the command finishes, we run the plot generation agian:

make plots

Expected Output:

Copying data for plots...
mkdir -p analysis/raw/
cp table-defs/portfolio-cpachecker.table.csv analysis/raw/
cp table-defs/portfolio-esbmc-kind.table.csv analysis/raw/
cp table-defs/portfolio-uautomizer.table.csv analysis/raw/
cp table-defs/portfolio-symbiotic.table.csv analysis/raw/
cp table-defs/portfolio-bubaak.table.csv analysis/raw/
cp table-defs/portfolio-cpv.table.csv analysis/raw/
cp data/reach_safety.table.csv analysis/raw/
cp results-verified/META_Cover-Error.table.csv analysis/raw/
cp table-defs/creation.table.csv analysis/raw/
cp table-defs/validation.table.csv analysis/raw/
Copied data successfully.
Creating plots...
cd analysis && python3 plot.py
RQ1: Converted tasks (done): 8
< 1.0s: 4/8 (50.00%)
< 0.2s: 0/8 (0.00%)
Created plots successfully.
Plots are in the analysis/final_plots/ directory

Taking a look at analysis/final_plots/Figure06.pdf now shows an updated version of the plot using the few data points that were just created.

open analysis/final_plots/Figure06.pdf

3.2 RQ1: Validating the Generated Witnesses

If you completed Step 3.1, you have two options:

  1. Continue with the witnesses you just generated.
  2. Restore and use the witnesses generated by the authors.

If you choose option 2, restore the original results. Warning: This will delete the current results directory.

[ -d results.bak ] && rm -r results && mv results.bak results

To validate witnesses using both validators employed in the paper (CPAchecker and Witch), run:

mv results-validate results-validate.bak
make small-validate-generated-witnesses

⚠️ This command will run approximately 10-15min depending on your system.
Expected Output:

benchexec/bin/benchexec --read-only-dir=/ --hidden-dir /tmp --overlay-dir=/home -o results-validate/ --startTime '2025-08-05 23:18:21' --tool-directory validators/witch benchmark-defs/small/witch-validate-violation-witnesses-2.0.xml 
2026-01-08 14:41:34 - WARNING - No propertyfile specified. Score computation will ignore the results.
2026-01-08 14:41:35 - WARNING - Ignoring specified resource requirements in local-execution mode, only resource limits are used.
2026-01-08 14:41:35 - INFO - Unable to find pqos_wrapper, please install it for cache allocation and monitoring if your CPU supports Intel RDT (cf. https://gitlab.com/sosy-lab/software/pqos-wrapper).

executing run set 'esbmc-incr-coverage-error-call-confirmed.esbmc-incr-test-suites'     (1 file)
14:41:35   06_fuzzle_50x50_0-cycle.yml    false(unreach-call)         24.89   24.96

executing run set 'fusebmc-coverage-error-call-confirmed.fusebmc-test-suites'     (1 file)
14:42:00   array_ptr_single_elem_init-2.yml    TIMEOUT (validation preprocessing)   90.81   91.04

executing run set 'hybridtiger-coverage-error-call-confirmed.hybridtiger-test-suites'     (1 file)
14:43:32   btor2c-lazyMod.driving_phils.3.prop1-func-interl.yml    false(unreach-call)         16.85   16.88

executing run set 'kleef-coverage-error-call-confirmed.kleef-test-suites'     (1 file)
14:43:49   s4liff.yml              false(unreach-call)          1.79    1.80

executing run set 'owic-coverage-error-call-confirmed.owic-test-suites'     (1 file)
14:43:51   elevator_spec3_product23.cil.yml    false(unreach-call)          2.09    2.10

executing run set 'tracerx-wp-coverage-error-call-confirmed.tracerx-wp-test-suites'     (1 file)
14:43:53   list-2.yml              false(unreach-call)         36.12   36.17

executing run set 'utestgen-coverage-error-call-confirmed.utestgen-test-suites'     (1 file)
14:44:30   rangesum.yml            false(unreach-call)          2.06    2.07

executing run set 'wasp-c-coverage-error-call-confirmed.wasp-c-test-suites'     (1 file)
14:44:32   brs5f.yml               false(unreach-call)          2.16    2.21

Statistics:              8 Files
  correct:               0
    correct true:        0
    correct false:       0
  incorrect:             0
    incorrect true:      0
    incorrect false:     0
  unknown:               1

In order to get HTML and CSV tables, run
benchexec/bin/table-generator results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.esbmc-incr-coverage-error-call-confirmed.esbmc-incr-test-suites.xml.bz2 results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.fusebmc-coverage-error-call-confirmed.fusebmc-test-suites.xml.bz2 results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.hybridtiger-coverage-error-call-confirmed.hybridtiger-test-suites.xml.bz2 results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.kleef-coverage-error-call-confirmed.kleef-test-suites.xml.bz2 results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.owic-coverage-error-call-confirmed.owic-test-suites.xml.bz2 results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.tracerx-wp-coverage-error-call-confirmed.tracerx-wp-test-suites.xml.bz2 results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.utestgen-coverage-error-call-confirmed.utestgen-test-suites.xml.bz2 results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.wasp-c-coverage-error-call-confirmed.wasp-c-test-suites.xml.bz2
benchexec/bin/benchexec --read-only-dir=/ --hidden-dir /tmp --overlay-dir=/home -o results-validate/ --startTime '2025-08-05 23:17:34' --tool-directory validators/cpachecker-4.0 benchmark-defs/small/cpachecker-validate-violation-witnesses-2.0.xml
2026-01-08 14:44:35 - WARNING - No propertyfile specified. Score computation will ignore the results.
2026-01-08 14:44:36 - WARNING - Ignoring specified resource requirements in local-execution mode, only resource limits are used.
2026-01-08 14:44:36 - INFO - Unable to find pqos_wrapper, please install it for cache allocation and monitoring if your CPU supports Intel RDT (cf. https://gitlab.com/sosy-lab/software/pqos-wrapper).

executing run set 'esbmc-incr-coverage-error-call-confirmed.esbmc-incr-test-suites'     (1 file)
14:44:36   06_fuzzle_50x50_0-cycle.yml    TIMEOUT                     97.30   73.34

executing run set 'fusebmc-coverage-error-call-confirmed.fusebmc-test-suites'     (1 file)
14:45:50   array_ptr_single_elem_init-2.yml    TIMEOUT                     96.14   59.47

executing run set 'hybridtiger-coverage-error-call-confirmed.hybridtiger-test-suites'     (1 file)
14:46:50   btor2c-lazyMod.driving_phils.3.prop1-func-interl.yml    true                        20.28   11.52

executing run set 'kleef-coverage-error-call-confirmed.kleef-test-suites'     (1 file)
14:47:01   s4liff.yml              true                         9.48    5.71

executing run set 'owic-coverage-error-call-confirmed.owic-test-suites'     (1 file)
14:47:07   elevator_spec3_product23.cil.yml    true                        15.90    9.06

executing run set 'tracerx-wp-coverage-error-call-confirmed.tracerx-wp-test-suites'     (1 file)
14:47:17   list-2.yml              true                        14.92    8.70

executing run set 'utestgen-coverage-error-call-confirmed.utestgen-test-suites'     (1 file)
14:47:26   rangesum.yml            true                        10.30    6.17

executing run set 'wasp-c-coverage-error-call-confirmed.wasp-c-test-suites'     (1 file)
14:47:32   brs5f.yml               true                         9.28    5.64

Statistics:              8 Files
  correct:               0
    correct true:        0
    correct false:       0
  incorrect:             0
    incorrect true:      0
    incorrect false:     0
  unknown:               2

In order to get HTML and CSV tables, run
benchexec/bin/table-generator results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.esbmc-incr-coverage-error-call-confirmed.esbmc-incr-test-suites.xml.bz2 results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.fusebmc-coverage-error-call-confirmed.fusebmc-test-suites.xml.bz2 results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.hybridtiger-coverage-error-call-confirmed.hybridtiger-test-suites.xml.bz2 results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.kleef-coverage-error-call-confirmed.kleef-test-suites.xml.bz2 results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.owic-coverage-error-call-confirmed.owic-test-suites.xml.bz2 results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.tracerx-wp-coverage-error-call-confirmed.tracerx-wp-test-suites.xml.bz2 results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.utestgen-coverage-error-call-confirmed.utestgen-test-suites.xml.bz2 results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.wasp-c-coverage-error-call-confirmed.wasp-c-test-suites.xml.bz2

This validates witnesses generated from the sample set of test suites. We can see based on the output where the validators confirm the result: if the line of the task says false(unreach-call) then it was confirmed. Our subset demonstrated one case, array_ptr_single_elem_init-2.yml where both validators run into a timeout. In Table 2 of the paper this task would contribute to the ~15% of unconfirmed witnesses of fusebmc.

We can recreate the plots by first updating the validation.table.csv and then recreating the plots.

make -B table-defs/validation.table.csv
make plots

Expected Output:

Generating validation.xml...
Generated validation.xml successfully
Generating validation.csv...
benchexec/bin/table-generator -x table-defs/validation.xml --no-diff -o table-defs -f csv
INFO: Reading table definition from 'table-defs/validation.xml'...
INFO:     table-defs/../results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.esbmc-incr-coverage-error-call-confirmed.esbmc-incr-test-suites.xml.bz2
INFO:     table-defs/../results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.esbmc-incr-coverage-error-call-confirmed.esbmc-incr-test-suites.xml.bz2
INFO:     table-defs/../results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.fusebmc-coverage-error-call-confirmed.fusebmc-test-suites.xml.bz2
INFO:     table-defs/../results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.hybridtiger-coverage-error-call-confirmed.hybridtiger-test-suites.xml.bz2
INFO:     table-defs/../results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.kleef-coverage-error-call-confirmed.kleef-test-suites.xml.bz2
INFO:     table-defs/../results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.owic-coverage-error-call-confirmed.owic-test-suites.xml.bz2
INFO:     table-defs/../results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.tracerx-wp-coverage-error-call-confirmed.tracerx-wp-test-suites.xml.bz2
INFO:     table-defs/../results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.utestgen-coverage-error-call-confirmed.utestgen-test-suites.xml.bz2
INFO:     table-defs/../results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.wasp-c-coverage-error-call-confirmed.wasp-c-test-suites.xml.bz2
INFO:     table-defs/../results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.fusebmc-coverage-error-call-confirmed.fusebmc-test-suites.xml.bz2
INFO:     table-defs/../results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.hybridtiger-coverage-error-call-confirmed.hybridtiger-test-suites.xml.bz2
INFO:     table-defs/../results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.kleef-coverage-error-call-confirmed.kleef-test-suites.xml.bz2
INFO:     table-defs/../results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.owic-coverage-error-call-confirmed.owic-test-suites.xml.bz2
INFO:     table-defs/../results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.tracerx-wp-coverage-error-call-confirmed.tracerx-wp-test-suites.xml.bz2
INFO:     table-defs/../results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.utestgen-coverage-error-call-confirmed.utestgen-test-suites.xml.bz2
INFO:     table-defs/../results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.wasp-c-coverage-error-call-confirmed.wasp-c-test-suites.xml.bz2
INFO: Merging results...
INFO: The resulting table will have 8 rows and 6 columns (in 2 run sets).
INFO: Generating table...
INFO: Writing CSV  into table-defs/validation.table.csv ...
INFO: done
Generated validation.csv successfully
Copying data for plots...
mkdir -p analysis/raw/
cp table-defs/portfolio-cpachecker.table.csv analysis/raw/
cp table-defs/portfolio-esbmc-kind.table.csv analysis/raw/
cp table-defs/portfolio-uautomizer.table.csv analysis/raw/
cp table-defs/portfolio-symbiotic.table.csv analysis/raw/
cp table-defs/portfolio-bubaak.table.csv analysis/raw/
cp table-defs/portfolio-cpv.table.csv analysis/raw/
cp data/reach_safety.table.csv analysis/raw/
cp results-verified/META_Cover-Error.table.csv analysis/raw/
cp table-defs/creation.table.csv analysis/raw/
cp table-defs/validation.table.csv analysis/raw/
Copied data successfully.
Creating plots...
cd analysis && python3 plot.py
RQ1: Converted tasks (done): 8
< 1.0s: 4/8 (50.00%)
< 0.2s: 0/8 (0.00%)
Created plots successfully.
Plots are in the analysis/final_plots/ directory

Inspecting Table 2 shows the updates results:

cat analysis/final_plots/Table02.tex

Expected Output:

\begin{tabular}{l S[table-format=5.0] S[table-format=5.0] S[table-format=5.0] S[table-format=3.1]}
\toprule
Tester & \multicolumn{1}{c}{\#\,Suites} & \multicolumn{1}{c}{\#\,Converted} & \multicolumn{1}{c}{\#\,Confirmed} & \multicolumn{1}{c}{\%\,Confirmed}\\
\midrule
esbmc-incr & 1 & 1 & 1 & 100.0\\
fusebmc & 1 & 1 & 0 & 0.0\\
hybridtiger & 1 & 1 & 1 & 100.0\\
kleef & 1 & 1 & 1 & 100.0\\
owic & 1 & 1 & 1 & 100.0\\
tracerx-wp & 1 & 1 & 1 & 100.0\\
utestgen & 1 & 1 & 1 & 100.0\\
wasp-c & 1 & 1 & 1 & 100.0\\
\midrule
Total & 8 & 8 & 7 & 87.5\\
\bottomrule

3.3 RQ2: Test Generators in SV-COMP

This step is empirical and uses data from previous competitions: Test-Comp 25 and SV-COMP 25. Both competitions provide respective artifacts. We include the subset necessary to run our evaluation in this artifact.

This is for reference. You do not need to download either of these. Artifact of Test-Comp 25: DOI 10.5282/ubm/data.667 (107Gb) Artifact of SV-COMP 25: DOI 10.5281/zenodo.15012085 (54Gb)

If you ran the plot generation in Step 2: This is already it! Figure07 and Figure08 already show our results from the empirical analysis.

If you did not do Step 2:
⚠️ This command will remove results you already generated into results and results-validate. If you wish to keep those results, move them somewhere outside the fase26 folder.
⚠️ Restoring the files takes about 5min. The finished size is 8.1Gb.

rm -r results results-validate
make restore-experiment-data
make -B table-defs/validation.table.csv table-defs/creation.table.csv
make plots

Expected Output:

Restoring experiment data...
Restoring from backups/experiment-data.tar.gz
8.17GiB 0:04:50 [28.8MiB/s] [                           <=>                                                                                                                            ]
Generating validation.xml...
Generated validation.xml successfully
Generating validation.csv...
PYTHONPATH=tool_info benchexec/bin/table-generator -x table-defs/validation.xml --no-diff -o table-defs -f csv
INFO: Reading table definition from 'table-defs/validation.xml'...
INFO:     table-defs/../results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.cetfuzz-coverage-error-call-confirmed.cetfuzz-test-suites.xml.bz2
INFO:     table-defs/../results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.cetfuzz-coverage-error-call-confirmed.cetfuzz-test-suites.xml.bz2
INFO:     table-defs/../results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.esbmc-incr-coverage-error-call-confirmed.esbmc-incr-test-suites.xml.bz2
INFO:     table-defs/../results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.esbmc-incr-coverage-error-call-confirmed.esbmc-incr-test-suites.xml.bz2
INFO:     table-defs/../results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.esbmc-kind-coverage-error-call-confirmed.esbmc-kind-test-suites.xml.bz2
INFO:     table-defs/../results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.esbmc-kind-coverage-error-call-confirmed.esbmc-kind-test-suites.xml.bz2
INFO:     table-defs/../results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.fdse-coverage-error-call-confirmed.fdse-test-suites.xml.bz2
INFO:     table-defs/../results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.fdse-coverage-error-call-confirmed.fdse-test-suites.xml.bz2
INFO:     table-defs/../results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.fizzer-coverage-error-call-confirmed.fizzer-test-suites.xml.bz2
INFO:     table-defs/../results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.fizzer-coverage-error-call-confirmed.fizzer-test-suites.xml.bz2
INFO:     table-defs/../results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.fusebmc-coverage-error-call-confirmed.fusebmc-test-suites.xml.bz2
INFO:     table-defs/../results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.fusebmc-coverage-error-call-confirmed.fusebmc-test-suites.xml.bz2
INFO:     table-defs/../results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.fusebmc-ia-coverage-error-call-confirmed.fusebmc-ia-test-suites.xml.bz2
INFO:     table-defs/../results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.fusebmc-ia-coverage-error-call-confirmed.fusebmc-ia-test-suites.xml.bz2
INFO:     table-defs/../results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.hybridtiger-coverage-error-call-confirmed.hybridtiger-test-suites.xml.bz2
INFO:     table-defs/../results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.hybridtiger-coverage-error-call-confirmed.hybridtiger-test-suites.xml.bz2
INFO:     table-defs/../results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.klee-coverage-error-call-confirmed.klee-test-suites.xml.bz2
INFO:     table-defs/../results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.klee-coverage-error-call-confirmed.klee-test-suites.xml.bz2
INFO:     table-defs/../results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.kleef-coverage-error-call-confirmed.kleef-test-suites.xml.bz2
INFO:     table-defs/../results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.kleef-coverage-error-call-confirmed.kleef-test-suites.xml.bz2
INFO:     table-defs/../results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.legion-coverage-error-call-confirmed.legion-test-suites.xml.bz2
INFO:     table-defs/../results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.legion-symcc-coverage-error-call-confirmed.legion-symcc-test-suites.xml.bz2
INFO:     table-defs/../results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.owic-coverage-error-call-confirmed.owic-test-suites.xml.bz2
INFO:     table-defs/../results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.prtest-coverage-error-call-confirmed.prtest-test-suites.xml.bz2
INFO:     table-defs/../results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.legion-coverage-error-call-confirmed.legion-test-suites.xml.bz2
INFO:     table-defs/../results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.legion-symcc-coverage-error-call-confirmed.legion-symcc-test-suites.xml.bz2
INFO:     table-defs/../results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.owic-coverage-error-call-confirmed.owic-test-suites.xml.bz2
INFO:     table-defs/../results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.rizzer-coverage-error-call-confirmed.rizzer-test-suites.xml.bz2
INFO:     table-defs/../results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.prtest-coverage-error-call-confirmed.prtest-test-suites.xml.bz2
INFO:     table-defs/../results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.rizzer-coverage-error-call-confirmed.rizzer-test-suites.xml.bz2
INFO:     table-defs/../results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.symbiotic-coverage-error-call-confirmed.symbiotic-test-suites.xml.bz2
INFO:     table-defs/../results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.symbiotic-coverage-error-call-confirmed.symbiotic-test-suites.xml.bz2
INFO:     table-defs/../results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.tracerx-coverage-error-call-confirmed.tracerx-test-suites.xml.bz2
INFO:     table-defs/../results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.tracerx-wp-coverage-error-call-confirmed.tracerx-wp-test-suites.xml.bz2
INFO:     table-defs/../results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.tracerx-coverage-error-call-confirmed.tracerx-test-suites.xml.bz2
INFO:     table-defs/../results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.tracerx-wp-coverage-error-call-confirmed.tracerx-wp-test-suites.xml.bz2
INFO:     table-defs/../results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.utestgen-coverage-error-call-confirmed.utestgen-test-suites.xml.bz2
INFO:     table-defs/../results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.wasp-c-coverage-error-call-confirmed.wasp-c-test-suites.xml.bz2
INFO:     table-defs/../results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.utestgen-coverage-error-call-confirmed.utestgen-test-suites.xml.bz2
INFO:     table-defs/../results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.wasp-c-coverage-error-call-confirmed.wasp-c-test-suites.xml.bz2
INFO: Merging results...
INFO: The resulting table will have 10941 rows and 6 columns (in 2 run sets).
INFO: Generating table...
INFO: Writing CSV  into table-defs/validation.table.csv ...
INFO: done
Generated validation.csv successfully
Generating creation.table.csv...
PYTHONPATH=tool_info benchexec/bin/table-generator -x table-defs/creation.xml -f csv --no-diff -o table-defs/
INFO: Reading table definition from 'table-defs/creation.xml'...
INFO:     table-defs/../results/test2witness.2025-10-16_10-29-27.results.cetfuzz-coverage-error-call-confirmed.cetfuzz-test-suites.xml.bz2
INFO:     table-defs/../results/test2witness.2025-10-16_10-29-27.results.esbmc-incr-coverage-error-call-confirmed.esbmc-incr-test-suites.xml.bz2
INFO:     table-defs/../results/test2witness.2025-10-16_10-29-27.results.esbmc-kind-coverage-error-call-confirmed.esbmc-kind-test-suites.xml.bz2
INFO:     table-defs/../results/test2witness.2025-10-16_10-29-27.results.fdse-coverage-error-call-confirmed.fdse-test-suites.xml.bz2
INFO:     table-defs/../results/test2witness.2025-10-16_10-29-27.results.fizzer-coverage-error-call-confirmed.fizzer-test-suites.xml.bz2
INFO:     table-defs/../results/test2witness.2025-10-16_10-29-27.results.fusebmc-coverage-error-call-confirmed.fusebmc-test-suites.xml.bz2
INFO:     table-defs/../results/test2witness.2025-10-16_10-29-27.results.fusebmc-ia-coverage-error-call-confirmed.fusebmc-ia-test-suites.xml.bz2
INFO:     table-defs/../results/test2witness.2025-10-16_10-29-27.results.hybridtiger-coverage-error-call-confirmed.hybridtiger-test-suites.xml.bz2
INFO:     table-defs/../results/test2witness.2025-10-16_10-29-27.results.klee-coverage-error-call-confirmed.klee-test-suites.xml.bz2
INFO:     table-defs/../results/test2witness.2025-10-16_10-29-27.results.kleef-coverage-error-call-confirmed.kleef-test-suites.xml.bz2
INFO:     table-defs/../results/test2witness.2025-10-16_10-29-27.results.legion-coverage-error-call-confirmed.legion-test-suites.xml.bz2
INFO:     table-defs/../results/test2witness.2025-10-16_10-29-27.results.legion-symcc-coverage-error-call-confirmed.legion-symcc-test-suites.xml.bz2
INFO:     table-defs/../results/test2witness.2025-10-16_10-29-27.results.owic-coverage-error-call-confirmed.owic-test-suites.xml.bz2
INFO:     table-defs/../results/test2witness.2025-10-16_10-29-27.results.prtest-coverage-error-call-confirmed.prtest-test-suites.xml.bz2
INFO:     table-defs/../results/test2witness.2025-10-16_10-29-27.results.rizzer-coverage-error-call-confirmed.rizzer-test-suites.xml.bz2
INFO:     table-defs/../results/test2witness.2025-10-16_10-29-27.results.symbiotic-coverage-error-call-confirmed.symbiotic-test-suites.xml.bz2
INFO:     table-defs/../results/test2witness.2025-10-16_10-29-27.results.tracerx-coverage-error-call-confirmed.tracerx-test-suites.xml.bz2
INFO:     table-defs/../results/test2witness.2025-10-16_10-29-27.results.tracerx-wp-coverage-error-call-confirmed.tracerx-wp-test-suites.xml.bz2
INFO:     table-defs/../results/test2witness.2025-10-16_10-29-27.results.utestgen-coverage-error-call-confirmed.utestgen-test-suites.xml.bz2
INFO:     table-defs/../results/test2witness.2025-10-16_10-29-27.results.wasp-c-coverage-error-call-confirmed.wasp-c-test-suites.xml.bz2
INFO: Merging results...
INFO: The resulting table will have 10941 rows and 3 columns (in 1 run sets).
INFO: Generating table...
INFO: Writing CSV  into table-defs/creation.table.csv ...
INFO: done
Generated creation.table.csv successfully
Copying data for plots...
mkdir -p analysis/raw/
cp table-defs/portfolio-cpachecker.table.csv analysis/raw/
cp table-defs/portfolio-esbmc-kind.table.csv analysis/raw/
cp table-defs/portfolio-uautomizer.table.csv analysis/raw/
cp table-defs/portfolio-symbiotic.table.csv analysis/raw/
cp table-defs/portfolio-bubaak.table.csv analysis/raw/
cp table-defs/portfolio-cpv.table.csv analysis/raw/
cp data/reach_safety.table.csv analysis/raw/
cp results-verified/META_Cover-Error.table.csv analysis/raw/
cp table-defs/creation.table.csv analysis/raw/
cp table-defs/validation.table.csv analysis/raw/
Copied data successfully.
Creating plots...
cd analysis && python3 plot.py
RQ1: Converted tasks (done): 10897
< 1.0s: 10490/10897 (96.27%)
< 0.2s: 8069/10897 (74.05%)
Created plots successfully.
Plots are in the analysis/final_plots/ directory

After this command completed you may compare Figures 07 and 08 to the ones in the Paper or in the folder ~/fase26/expected.

3.4 RQ3: Efficiency of Parallel Portfolio Experiments

In RQ3 we claim that while on average, the portfolio consumes more CPU time than just the verifier alone, there are examples where the portfolio consumes less. We demonstrate this on one example with uautomizer and afltc.

First, back up the portfolio results:

cd $HOME/fase26
mv results-portfolio results-portfolio.bak

Then we need to make a small adjustment such that AFL (part of the afltc tester) works:

echo core | sudo tee /proc/sys/kernel/core_pattern

When prompted for the password enter: fase26

To execute a minimal portfolio run that exercises each verifier and each test generator at least once, run:

make portfolio-rq3

This executes the ultimate automizer standalone and ultimate automizer in a portfolio with afltc on the task: sv-benchmarks/c/array-fpi/eqn1f.yml

Expected Output:

PYTHONPATH=combinations benchexec/bin/benchexec --read-only-dir=/ --hidden-dir /tmp --overlay-dir=/home -o results-portfolio/  --tool-directory combinations benchmark-defs/combinations/parallel-uautomizer-afltc-rq3.xml 
2026-01-08 15:36:23 - WARNING - Cannot determine combinations/ppexec version, error output: mkdir: cannot create directory ‘/sys/fs/cgroup/init’: Read-only file system
mkdir: cannot create directory ‘/sys/fs/cgroup/benchexec’: Read-only file system

2026-01-08 15:36:23 - WARNING - Ignoring specified resource requirements in local-execution mode, only resource limits are used.
2026-01-08 15:36:23 - INFO - Unable to find pqos_wrapper, please install it for cache allocation and monitoring if your CPU supports Intel RDT (cf. https://gitlab.com/sosy-lab/software/pqos-wrapper).

executing run set 'pp-uautomizer-afltc.eqn1f'     (1 file)
15:36:23   eqn1f.yml               false                       41.73   18.45

Statistics:              1 Files
  correct:               1
    correct true:        0
    correct false:       1
  incorrect:             0
    incorrect true:      0
    incorrect false:     0
  unknown:               0
  Score:                 1 (max: 1)

In order to get HTML and CSV tables, run
benchexec/bin/table-generator results-portfolio/parallel-uautomizer-afltc-rq3.2026-01-08_15-36-23.results.pp-uautomizer-afltc.eqn1f.xml.bz2
PYTHONPATH=combinations benchexec/bin/benchexec --read-only-dir=/ --hidden-dir /tmp --overlay-dir=/home -o results-portfolio/  --tool-directory combinations -o results-baseline/ benchmark-defs/combinations/uautomizer-rq3.xml 
2026-01-08 15:36:43 - WARNING - Cannot determine combinations/ppexec version, error output: mkdir: cannot create directory ‘/sys/fs/cgroup/init’: Read-only file system
mkdir: cannot create directory ‘/sys/fs/cgroup/benchexec’: Read-only file system

2026-01-08 15:36:43 - WARNING - Ignoring specified resource requirements in local-execution mode, only resource limits are used.
2026-01-08 15:36:43 - INFO - Unable to find pqos_wrapper, please install it for cache allocation and monitoring if your CPU supports Intel RDT (cf. https://gitlab.com/sosy-lab/software/pqos-wrapper).

executing run set 'uautomizer-reference.eqn1f'     (1 file)
15:36:43   eqn1f.yml               false                       83.05   48.93

Statistics:              1 Files
  correct:               1
    correct true:        0
    correct false:       1
  incorrect:             0
    incorrect true:      0
    incorrect false:     0
  unknown:               0
  Score:                 1 (max: 1)

In order to get HTML and CSV tables, run
benchexec/bin/table-generator results-baseline/uautomizer-rq3.2026-01-08_15-36-43.results.uautomizer-reference.eqn1f.xml.bz2

The first run shows the portfolio of uautomizer and afltc which consumes (in the example above) 41.73s of CPU time. Compare that to the second run, which is uautomizer alone and consumes 83.05s of CPU time.

3.5 RQ4: Effectiveness of Parallel Portfolio Experiments

In RQ4 we claim that the portfolios can solve tasks, that the verifier alone could not solve. To show this we select one task: sv-benchmarks/c/reducercommutativity/rangesum.yml and the portfolio of cpachecker and fusebmc.

⚠️ We assume you have moved the results-portfolio to results-portfolio.bak in Step 3.4

To run the example task execute:

make portfolio-rq4

This will run the portfolio cpachecker-fusebmc on the task first, then just cpachecker alone.
⚠️ The expected behavior is, that cpachecker alone will timeout after 900s. So this target will take a little over 16min to complete.

Expected Output:

PYTHONPATH=combinations benchexec/bin/benchexec --read-only-dir=/ --hidden-dir /tmp --overlay-dir=/home -o results-portfolio/  --tool-directory combinations benchmark-defs/combinations/parallel-cpachecker-fusebmc-rq4.xml 
2026-01-08 15:42:53 - WARNING - Cannot determine combinations/ppexec version, error output: mkdir: cannot create directory ‘/sys/fs/cgroup/init’: Read-only file system
mkdir: cannot create directory ‘/sys/fs/cgroup/benchexec’: Read-only file system

2026-01-08 15:42:53 - WARNING - Ignoring specified resource requirements in local-execution mode, only resource limits are used.
2026-01-08 15:42:53 - INFO - Unable to find pqos_wrapper, please install it for cache allocation and monitoring if your CPU supports Intel RDT (cf. https://gitlab.com/sosy-lab/software/pqos-wrapper).

executing run set 'pp-cpachecker-fusebmc.rangesum'     (1 file)
15:42:53   rangesum.yml            false                       35.76   19.78

Statistics:              1 Files
  correct:               1
    correct true:        0
    correct false:       1
  incorrect:             0
    incorrect true:      0
    incorrect false:     0
  unknown:               0
  Score:                 1 (max: 1)

In order to get HTML and CSV tables, run
benchexec/bin/table-generator results-portfolio/parallel-cpachecker-fusebmc-rq4.2026-01-08_15-42-53.results.pp-cpachecker-fusebmc.rangesum.xml.bz2
PYTHONPATH=combinations benchexec/bin/benchexec --read-only-dir=/ --hidden-dir /tmp --overlay-dir=/home -o results-portfolio/  --tool-directory combinations -o results-baseline/ benchmark-defs/combinations/cpachecker-rq4.xml 
2026-01-08 15:43:14 - WARNING - Cannot determine combinations/ppexec version, error output: mkdir: cannot create directory ‘/sys/fs/cgroup/init’: Read-only file system
mkdir: cannot create directory ‘/sys/fs/cgroup/benchexec’: Read-only file system

2026-01-08 15:43:14 - WARNING - Ignoring specified resource requirements in local-execution mode, only resource limits are used.
2026-01-08 15:43:14 - INFO - Unable to find pqos_wrapper, please install it for cache allocation and monitoring if your CPU supports Intel RDT (cf. https://gitlab.com/sosy-lab/software/pqos-wrapper).

executing run set 'cpachecker-reference.rangesum'     (1 file)
15:43:14   rangesum.yml            TIMEOUT                    902.80  869.79

Statistics:              1 Files
  correct:               0
    correct true:        0
    correct false:       0
  incorrect:             0
    incorrect true:      0
    incorrect false:     0
  unknown:               1
  Score:                 0 (max: 1)

In order to get HTML and CSV tables, run
benchexec/bin/table-generator results-baseline/cpachecker-rq4.2026-01-08_15-43-14.results.cpachecker-reference.rangesum.xml.bz2

We can see the portfolio solves the task in 35.76s CPU time, while cpachecker alone runs into a timeout.

4. Running the Full Set of Experiments

⚠️ Warning: Long-running experiments

The full experimental evaluation is computationally expensive:

  • Witness generation may take 10–20 minutes inside the VM.

  • Witness validation may take several hours up to a full day.

  • The parallel portfolio experiments will take multiple days to weeks on a 4-core setup.

Proceed only if you are prepared for these runtimes.

4.1 RQ1: Full Witness Generation

Follow the setup from 3.1. Then instead of the small- run the full experiment.

cd $HOME/fase26
make witnesses-from-testsuites

4.2 RQ1: Full Witness Validation

Follow the setup from 3.2. Then instead of the small- run the full experiment.

make validate-generated-witnesses

⚠️ This benchmark tries to match the entire sv-benchmarks suite against the existing witnesses. This leads to a lot of messages of the form:

2026-01-08 20:45:21 - WARNING - No files found matching '../results/test2witness.2025-10-16_10-29-27.files/${rundefinition_name}/${taskdef_name}/output/witness.yml'.
2026-01-08 20:45:21 - WARNING - Pattern ../results/test2witness.2025-10-16_10-29-27.files/${rundefinition_name}/${taskdef_name}/output/witness.yml in requiredfiles tag did not match any file for task sv-benchmarks/c/xcsp/aim-200-6-0-sat-4.yml.

This is normal and expected.

4.3 RQ2: Test Generators in SV-COMP

The steps are identical to 3.3. Nothing new is to be done.

4.4 RQ3 & RQ4: Full Parallel Portfolio Evaluation

Follow the setup from 3.4. Then instead of just our examples run the full experiment.

⚠️ Warning: Even a single combination will take several hours to days to complete on a 4 core VM. ### Evaluated Tools

Verifiers Test Generators
CPAchecker prtest
UAutomizer fusebmc
ESBMC-kind fizzer
Symbiotic afltc
Bubaak  
CPV  

Each combination corresponds to a Make target of the form <verifier>-<test-generator>.

To run a specific combination, for example CPAchecker with prtest, execute:

make cpachecker-prtest

To run all 24 evaluated, execute:

make par-all

4.5 Recreating tables and plots

To completely recreate the plots and their required data (after the experiments have run): ⚠️ Note: This will run a python script that will permanently update the table-defs/portfolio-*.xml to include the latest run portfolios. We recommend creating a backup of the table-defs cp -r table-defs table-defs.bak.

make -B plots

Files

LICENSE.txt

Files (35.7 GB)

Name Size Download all
md5:c846ebb396f8b174b10ded4771514fcc
10.3 kB Preview Download
md5:3c1acf1aa96a894ab1559fafb41cc7d8
50.5 kB Preview Download
md5:0d3653893d8c5442ed69325833c969d8
213 Bytes Preview Download
md5:e80e1b7154684d2e687e096104a4db0b
853 Bytes Preview Download
md5:734d444cdddb758e819ed5fb9a66b355
35.7 GB Download