Artifact for the FASE 26 Paper: Testing in Formal Verification via Witness Generation (Empirical Evaluation)

Beyer, Dirk; Lemberger, Thomas; Wachowitz, Henrik

doi:10.5281/zenodo.18190957

Published January 9, 2026 | Version fase26-artifact-evaluation

Software Open

Artifact for the FASE 26 Paper: Testing in Formal Verification via Witness Generation (Empirical Evaluation)

1. Ludwig-Maximilians-Universität München
2. LMU Munich

Artifact for the FASE 26 Paper

Testing in Formal Verification via Witness Generation (Empirical Evaluation)

This artifact guides you through reproducing the plots and tables presented in the paper and through rerunning the experimental evaluation.

The artifact is shipped as a VirtualBox VM. - The artifact is contained within a Ubuntu 24.04 VM. - We require at least 16 GB of Memory and 4 Cores

Getting Started

Username and Password
user: test-to-witness
password: fase26

This artifact contains:

analysis/ : Files concerning plots
backup/ : A backup of the experiment data
benchexec/ : The benchmarking tool
benchmark-defs/ : Files defining benchmarks
benchmark-sets/ : Files describing sets of tasks
combinations/ : Code of the portfolio
data/ : data from SV-COMP 25
expected/ : The expected plots
Makefile
results/ : Here are the generated witnesses from test-to-witness
results-baseline/ : results of the baseline (just the verifier) for the portfolios
results-portfolio/ : results of the portfolio experiments
results-validate/ : results of the validation of generated witnesses
results-verified/ : Data from Test-Comp 25
sv-benchmarks/ : The sv-benchmark tasks
table-defs/ : Files defining how to create csv tables from the raw experiment data
test-to-witness/ : The tool test-to-witness
tool_info/ : Files telling benchexec how to run our tools
validators/ : The validators used in the experiment

If you are interested in just the tool Test-To-Witness you may navigate into the test-to-witness folder. There we include a README on how to use test-to-witness. Please note: We did not install a golang distribution in the VM, thus it is not possible to recompile test-to-witness. For the most up-to-date version we refer readers to our public gitlab: https://gitlab.com/sosy-lab/software/test-to-witness.

This artifact document consists of the following steps:

Downloading and installing the VM
Recreating the plots and tables from the paper
Running a subset of the experiments
Running the full set of experiments

For Getting Started we recommend doing Sections 1, 2 and 3.1 these should be able to be completed in less than 30min.

Sections 2, 3 and 4 also serve as the Step-by-Step guide.

1. Download and Installation

Instructions

Download the artifact VM from Zenodo using the DOI provided with the paper. The download consists of a single VirtualBox appliance file with the extension .ova.
Install Oracle VirtualBox (version 7 or newer is recommended) from the official website.
Start VirtualBox and select File → Import Appliance.
Choose the downloaded .ova file and follow the import wizard. You may adjust the VM settings during import; we require allocating at least 16 GB of RAM and 4 CPU cores.
After the import finishes, select the VM from the list and click Start to launch it.
Log in to the VM using the credentials: user: test-to-witness pw: fase26

Inside the VM

To keep the VM as small as possible to Download, we compressed the data inside the VM. Before starting the evaluation please decompress the data. This will take approximately 15-30min. The resulting size is approximately 49Gb.

pigz -dc fase26.tar.gz | pv | tar xf -

2. Recreating the Plots and Tables of the Paper

To recreate the plots, first open a terminal and change into the directory containing all data and scripts:

cd $HOME/fase26

To generate all plots and tables used in the paper, run the following command (this takes approximately 1–2 minutes):

make plots

Expected Output:

Copying data for plots...
mkdir -p analysis/raw/
cp table-defs/portfolio-cpachecker.table.csv analysis/raw/
cp table-defs/portfolio-esbmc-kind.table.csv analysis/raw/
cp table-defs/portfolio-uautomizer.table.csv analysis/raw/
cp table-defs/portfolio-symbiotic.table.csv analysis/raw/
cp table-defs/portfolio-bubaak.table.csv analysis/raw/
cp table-defs/portfolio-cpv.table.csv analysis/raw/
cp data/reach_safety.table.csv analysis/raw/
cp results-verified/META_Cover-Error.table.csv analysis/raw/
cp table-defs/creation.table.csv analysis/raw/
cp table-defs/validation.table.csv analysis/raw/
Copied data successfully.
Creating plots...
cd analysis && python3 plot.py
RQ1: Converted tasks (done): 10897
< 1.0s: 10490/10897 (96.27%)
< 0.2s: 8069/10897 (74.05%)
Created plots successfully.
Plots are in the analysis/final_plots/ directory

The generated plots are located in analysis/final_plots/.
For convenience, we also provide the expected plots and tables in the directory expected/ so that you can directly compare results.

3. Running a Subset of the Experiments

Because the full experiments are time-consuming, we provide a smoke-test mode that executes a small but representative subset of the experiments.

There are three experiments in the paper:

Creating witnesses from test suites
Validating the generated witnesses
Running parallel verifier–test-generator portfolios

Backup of Results

The experiments will write to the same files that the original experiments conducted by the authors wrote to. The benchmarking utility benchexec will terminate with an error if one of the old experiemt files is still present. In the following steps, we manually move result directories to make way for the new set of experiments about to be run. An additional backup of all original results is stored in backups/backup.tar.gz. It may be restored with make restore-experiment-data

3.1 RQ1: Creating Witnesses from Test Suites

To generate witnesses from test cases, execute:

cd $HOME/fase26
mv results results.bak
make small-witnesses-from-testsuites

Expected Output:

PYTHONPATH=tool_info benchexec/bin/benchexec --read-only-dir=/ --hidden-dir /tmp --overlay-dir=/home --no-compress-results -o results/ --tool-directory test-to-witness --startTime '2025-10-16 10:29:27' benchmark-defs/small/test2witness.xml
2026-01-08 14:09:59 - WARNING - No propertyfile specified. Score computation will ignore the results.
2026-01-08 14:09:59 - INFO - Unable to find pqos_wrapper, please install it for cache allocation and monitoring if your CPU supports Intel RDT (cf. https://gitlab.com/sosy-lab/software/pqos-wrapper).

executing run set 'esbmc-incr-coverage-error-call-confirmed.esbmc-incr-test-suites'     (1 file)
14:09:59   06_fuzzle_50x50_0-cycle.yml    done                         2.30    2.67

executing run set 'fusebmc-coverage-error-call-confirmed.fusebmc-test-suites'     (1 file)
14:10:02   array_ptr_single_elem_init-2.yml    done                         5.64    3.54

executing run set 'hybridtiger-coverage-error-call-confirmed.hybridtiger-test-suites'     (1 file)
14:10:06   btor2c-lazyMod.driving_phils.3.prop1-func-interl.yml    done                         1.33    1.72

executing run set 'kleef-coverage-error-call-confirmed.kleef-test-suites'     (1 file)
14:10:08   s4liff.yml              done                         0.26    0.49

executing run set 'owic-coverage-error-call-confirmed.owic-test-suites'     (1 file)
14:10:08   elevator_spec3_product23.cil.yml    done                         0.33    0.57

executing run set 'tracerx-wp-coverage-error-call-confirmed.tracerx-wp-test-suites'     (1 file)
14:10:09   list-2.yml              done                       175.25  167.49

executing run set 'utestgen-coverage-error-call-confirmed.utestgen-test-suites'     (1 file)
14:12:57   rangesum.yml            done                         0.26    0.44

executing run set 'wasp-c-coverage-error-call-confirmed.wasp-c-test-suites'     (1 file)
14:12:58   brs5f.yml               done                         0.21    0.70

Statistics:              8 Files
  correct:               0
    correct true:        0
    correct false:       0
  incorrect:             0
    incorrect true:      0
    incorrect false:     0
  unknown:               0

In order to get HTML and CSV tables, run
benchexec/bin/table-generator results/test2witness.2025-10-16_10-29-27.results.esbmc-incr-coverage-error-call-confirmed.esbmc-incr-test-suites.xml results/test2witness.2025-10-16_10-29-27.results.fusebmc-coverage-error-call-confirmed.fusebmc-test-suites.xml results/test2witness.2025-10-16_10-29-27.results.hybridtiger-coverage-error-call-confirmed.hybridtiger-test-suites.xml results/test2witness.2025-10-16_10-29-27.results.kleef-coverage-error-call-confirmed.kleef-test-suites.xml results/test2witness.2025-10-16_10-29-27.results.owic-coverage-error-call-confirmed.owic-test-suites.xml results/test2witness.2025-10-16_10-29-27.results.tracerx-wp-coverage-error-call-confirmed.tracerx-wp-test-suites.xml results/test2witness.2025-10-16_10-29-27.results.utestgen-coverage-error-call-confirmed.utestgen-test-suites.xml results/test2witness.2025-10-16_10-29-27.results.wasp-c-coverage-error-call-confirmed.wasp-c-test-suites.xml

This command generates witnesses from test cases produced by 8 different test case generators and representing the different runtimes a conversion can take on.

We recreate the creation.table.csv used by the plot script with our newly created data:

make -B table-defs/creation.table.csv

Expected Output:

Generating creation.table.csv...
benchexec/bin/table-generator -x table-defs/creation.xml -f csv --no-diff -o table-defs/
INFO: Reading table definition from 'table-defs/creation.xml'...
WARNING: No file matches 'table-defs/../results/test2witness.2025-10-16_10-29-27.results.cetfuzz-coverage-error-call-confirmed.cetfuzz-test-suites.xml.bz2'.
INFO:     table-defs/../results/test2witness.2025-10-16_10-29-27.results.esbmc-incr-coverage-error-call-confirmed.esbmc-incr-test-suites.xml.bz2
WARNING: No file matches 'table-defs/../results/test2witness.2025-10-16_10-29-27.results.esbmc-kind-coverage-error-call-confirmed.esbmc-kind-test-suites.xml.bz2'.
WARNING: No file matches 'table-defs/../results/test2witness.2025-10-16_10-29-27.results.fdse-coverage-error-call-confirmed.fdse-test-suites.xml.bz2'.
WARNING: No file matches 'table-defs/../results/test2witness.2025-10-16_10-29-27.results.fizzer-coverage-error-call-confirmed.fizzer-test-suites.xml.bz2'.
INFO:     table-defs/../results/test2witness.2025-10-16_10-29-27.results.fusebmc-coverage-error-call-confirmed.fusebmc-test-suites.xml.bz2
WARNING: No file matches 'table-defs/../results/test2witness.2025-10-16_10-29-27.results.fusebmc-ia-coverage-error-call-confirmed.fusebmc-ia-test-suites.xml.bz2'.
INFO:     table-defs/../results/test2witness.2025-10-16_10-29-27.results.hybridtiger-coverage-error-call-confirmed.hybridtiger-test-suites.xml.bz2
WARNING: No file matches 'table-defs/../results/test2witness.2025-10-16_10-29-27.results.klee-coverage-error-call-confirmed.klee-test-suites.xml.bz2'.
INFO:     table-defs/../results/test2witness.2025-10-16_10-29-27.results.kleef-coverage-error-call-confirmed.kleef-test-suites.xml.bz2
WARNING: No file matches 'table-defs/../results/test2witness.2025-10-16_10-29-27.results.legion-coverage-error-call-confirmed.legion-test-suites.xml.bz2'.
WARNING: No file matches 'table-defs/../results/test2witness.2025-10-16_10-29-27.results.legion-symcc-coverage-error-call-confirmed.legion-symcc-test-suites.xml.bz2'.
INFO:     table-defs/../results/test2witness.2025-10-16_10-29-27.results.owic-coverage-error-call-confirmed.owic-test-suites.xml.bz2
WARNING: No file matches 'table-defs/../results/test2witness.2025-10-16_10-29-27.results.prtest-coverage-error-call-confirmed.prtest-test-suites.xml.bz2'.
WARNING: No file matches 'table-defs/../results/test2witness.2025-10-16_10-29-27.results.rizzer-coverage-error-call-confirmed.rizzer-test-suites.xml.bz2'.
WARNING: No file matches 'table-defs/../results/test2witness.2025-10-16_10-29-27.results.symbiotic-coverage-error-call-confirmed.symbiotic-test-suites.xml.bz2'.
WARNING: No file matches 'table-defs/../results/test2witness.2025-10-16_10-29-27.results.tracerx-coverage-error-call-confirmed.tracerx-test-suites.xml.bz2'.
INFO:     table-defs/../results/test2witness.2025-10-16_10-29-27.results.tracerx-wp-coverage-error-call-confirmed.tracerx-wp-test-suites.xml.bz2
INFO:     table-defs/../results/test2witness.2025-10-16_10-29-27.results.utestgen-coverage-error-call-confirmed.utestgen-test-suites.xml.bz2
INFO:     table-defs/../results/test2witness.2025-10-16_10-29-27.results.wasp-c-coverage-error-call-confirmed.wasp-c-test-suites.xml.bz2
WARNING: Missing module ".testtowitness", cannot extract values from log files (ImportError: No module named '.testtowitness').
INFO: Merging results...
INFO: The resulting table will have 8 rows and 3 columns (in 1 run sets).
INFO: Generating table...
INFO: Writing CSV  into table-defs/creation.table.csv ...
INFO: done
Generated creation.table.csv successfully

After the command finishes, we run the plot generation agian:

make plots

Expected Output:

Copying data for plots...
mkdir -p analysis/raw/
cp table-defs/portfolio-cpachecker.table.csv analysis/raw/
cp table-defs/portfolio-esbmc-kind.table.csv analysis/raw/
cp table-defs/portfolio-uautomizer.table.csv analysis/raw/
cp table-defs/portfolio-symbiotic.table.csv analysis/raw/
cp table-defs/portfolio-bubaak.table.csv analysis/raw/
cp table-defs/portfolio-cpv.table.csv analysis/raw/
cp data/reach_safety.table.csv analysis/raw/
cp results-verified/META_Cover-Error.table.csv analysis/raw/
cp table-defs/creation.table.csv analysis/raw/
cp table-defs/validation.table.csv analysis/raw/
Copied data successfully.
Creating plots...
cd analysis && python3 plot.py
RQ1: Converted tasks (done): 8
< 1.0s: 4/8 (50.00%)
< 0.2s: 0/8 (0.00%)
Created plots successfully.
Plots are in the analysis/final_plots/ directory

Taking a look at analysis/final_plots/Figure06.pdf now shows an updated version of the plot using the few data points that were just created.

open analysis/final_plots/Figure06.pdf

3.2 RQ1: Validating the Generated Witnesses

If you completed Step 3.1, you have two options:

Continue with the witnesses you just generated.
Restore and use the witnesses generated by the authors.

If you choose option 2, restore the original results. Warning: This will delete the current results directory.

[ -d results.bak ] && rm -r results && mv results.bak results

To validate witnesses using both validators employed in the paper (CPAchecker and Witch), run:

mv results-validate results-validate.bak
make small-validate-generated-witnesses

⚠️ This command will run approximately 10-15min depending on your system.
Expected Output:

benchexec/bin/benchexec --read-only-dir=/ --hidden-dir /tmp --overlay-dir=/home -o results-validate/ --startTime '2025-08-05 23:18:21' --tool-directory validators/witch benchmark-defs/small/witch-validate-violation-witnesses-2.0.xml 
2026-01-08 14:41:34 - WARNING - No propertyfile specified. Score computation will ignore the results.
2026-01-08 14:41:35 - WARNING - Ignoring specified resource requirements in local-execution mode, only resource limits are used.
2026-01-08 14:41:35 - INFO - Unable to find pqos_wrapper, please install it for cache allocation and monitoring if your CPU supports Intel RDT (cf. https://gitlab.com/sosy-lab/software/pqos-wrapper).

executing run set 'esbmc-incr-coverage-error-call-confirmed.esbmc-incr-test-suites'     (1 file)
14:41:35   06_fuzzle_50x50_0-cycle.yml    false(unreach-call)         24.89   24.96

executing run set 'fusebmc-coverage-error-call-confirmed.fusebmc-test-suites'     (1 file)
14:42:00   array_ptr_single_elem_init-2.yml    TIMEOUT (validation preprocessing)   90.81   91.04

executing run set 'hybridtiger-coverage-error-call-confirmed.hybridtiger-test-suites'     (1 file)
14:43:32   btor2c-lazyMod.driving_phils.3.prop1-func-interl.yml    false(unreach-call)         16.85   16.88

executing run set 'kleef-coverage-error-call-confirmed.kleef-test-suites'     (1 file)
14:43:49   s4liff.yml              false(unreach-call)          1.79    1.80

executing run set 'owic-coverage-error-call-confirmed.owic-test-suites'     (1 file)
14:43:51   elevator_spec3_product23.cil.yml    false(unreach-call)          2.09    2.10

executing run set 'tracerx-wp-coverage-error-call-confirmed.tracerx-wp-test-suites'     (1 file)
14:43:53   list-2.yml              false(unreach-call)         36.12   36.17

executing run set 'utestgen-coverage-error-call-confirmed.utestgen-test-suites'     (1 file)
14:44:30   rangesum.yml            false(unreach-call)          2.06    2.07

executing run set 'wasp-c-coverage-error-call-confirmed.wasp-c-test-suites'     (1 file)
14:44:32   brs5f.yml               false(unreach-call)          2.16    2.21

Statistics:              8 Files
  correct:               0
    correct true:        0
    correct false:       0
  incorrect:             0
    incorrect true:      0
    incorrect false:     0
  unknown:               1

In order to get HTML and CSV tables, run
benchexec/bin/table-generator results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.esbmc-incr-coverage-error-call-confirmed.esbmc-incr-test-suites.xml.bz2 results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.fusebmc-coverage-error-call-confirmed.fusebmc-test-suites.xml.bz2 results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.hybridtiger-coverage-error-call-confirmed.hybridtiger-test-suites.xml.bz2 results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.kleef-coverage-error-call-confirmed.kleef-test-suites.xml.bz2 results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.owic-coverage-error-call-confirmed.owic-test-suites.xml.bz2 results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.tracerx-wp-coverage-error-call-confirmed.tracerx-wp-test-suites.xml.bz2 results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.utestgen-coverage-error-call-confirmed.utestgen-test-suites.xml.bz2 results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.wasp-c-coverage-error-call-confirmed.wasp-c-test-suites.xml.bz2
benchexec/bin/benchexec --read-only-dir=/ --hidden-dir /tmp --overlay-dir=/home -o results-validate/ --startTime '2025-08-05 23:17:34' --tool-directory validators/cpachecker-4.0 benchmark-defs/small/cpachecker-validate-violation-witnesses-2.0.xml
2026-01-08 14:44:35 - WARNING - No propertyfile specified. Score computation will ignore the results.
2026-01-08 14:44:36 - WARNING - Ignoring specified resource requirements in local-execution mode, only resource limits are used.
2026-01-08 14:44:36 - INFO - Unable to find pqos_wrapper, please install it for cache allocation and monitoring if your CPU supports Intel RDT (cf. https://gitlab.com/sosy-lab/software/pqos-wrapper).

executing run set 'esbmc-incr-coverage-error-call-confirmed.esbmc-incr-test-suites'     (1 file)
14:44:36   06_fuzzle_50x50_0-cycle.yml    TIMEOUT                     97.30   73.34

executing run set 'fusebmc-coverage-error-call-confirmed.fusebmc-test-suites'     (1 file)
14:45:50   array_ptr_single_elem_init-2.yml    TIMEOUT                     96.14   59.47

executing run set 'hybridtiger-coverage-error-call-confirmed.hybridtiger-test-suites'     (1 file)
14:46:50   btor2c-lazyMod.driving_phils.3.prop1-func-interl.yml    true                        20.28   11.52

executing run set 'kleef-coverage-error-call-confirmed.kleef-test-suites'     (1 file)
14:47:01   s4liff.yml              true                         9.48    5.71

executing run set 'owic-coverage-error-call-confirmed.owic-test-suites'     (1 file)
14:47:07   elevator_spec3_product23.cil.yml    true                        15.90    9.06

executing run set 'tracerx-wp-coverage-error-call-confirmed.tracerx-wp-test-suites'     (1 file)
14:47:17   list-2.yml              true                        14.92    8.70

executing run set 'utestgen-coverage-error-call-confirmed.utestgen-test-suites'     (1 file)
14:47:26   rangesum.yml            true                        10.30    6.17

executing run set 'wasp-c-coverage-error-call-confirmed.wasp-c-test-suites'     (1 file)
14:47:32   brs5f.yml               true                         9.28    5.64

Statistics:              8 Files
  correct:               0
    correct true:        0
    correct false:       0
  incorrect:             0
    incorrect true:      0
    incorrect false:     0
  unknown:               2

In order to get HTML and CSV tables, run
benchexec/bin/table-generator results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.esbmc-incr-coverage-error-call-confirmed.esbmc-incr-test-suites.xml.bz2 results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.fusebmc-coverage-error-call-confirmed.fusebmc-test-suites.xml.bz2 results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.hybridtiger-coverage-error-call-confirmed.hybridtiger-test-suites.xml.bz2 results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.kleef-coverage-error-call-confirmed.kleef-test-suites.xml.bz2 results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.owic-coverage-error-call-confirmed.owic-test-suites.xml.bz2 results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.tracerx-wp-coverage-error-call-confirmed.tracerx-wp-test-suites.xml.bz2 results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.utestgen-coverage-error-call-confirmed.utestgen-test-suites.xml.bz2 results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.wasp-c-coverage-error-call-confirmed.wasp-c-test-suites.xml.bz2

This validates witnesses generated from the sample set of test suites. We can see based on the output where the validators confirm the result: if the line of the task says false(unreach-call) then it was confirmed. Our subset demonstrated one case, array_ptr_single_elem_init-2.yml where both validators run into a timeout. In Table 2 of the paper this task would contribute to the ~15% of unconfirmed witnesses of fusebmc.

We can recreate the plots by first updating the validation.table.csv and then recreating the plots.

make -B table-defs/validation.table.csv
make plots

Expected Output:

Generating validation.xml...
Generated validation.xml successfully
Generating validation.csv...
benchexec/bin/table-generator -x table-defs/validation.xml --no-diff -o table-defs -f csv
INFO: Reading table definition from 'table-defs/validation.xml'...
INFO:     table-defs/../results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.esbmc-incr-coverage-error-call-confirmed.esbmc-incr-test-suites.xml.bz2
INFO:     table-defs/../results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.esbmc-incr-coverage-error-call-confirmed.esbmc-incr-test-suites.xml.bz2
INFO:     table-defs/../results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.fusebmc-coverage-error-call-confirmed.fusebmc-test-suites.xml.bz2
INFO:     table-defs/../results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.hybridtiger-coverage-error-call-confirmed.hybridtiger-test-suites.xml.bz2
INFO:     table-defs/../results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.kleef-coverage-error-call-confirmed.kleef-test-suites.xml.bz2
INFO:     table-defs/../results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.owic-coverage-error-call-confirmed.owic-test-suites.xml.bz2
INFO:     table-defs/../results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.tracerx-wp-coverage-error-call-confirmed.tracerx-wp-test-suites.xml.bz2
INFO:     table-defs/../results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.utestgen-coverage-error-call-confirmed.utestgen-test-suites.xml.bz2
INFO:     table-defs/../results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.wasp-c-coverage-error-call-confirmed.wasp-c-test-suites.xml.bz2
INFO:     table-defs/../results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.fusebmc-coverage-error-call-confirmed.fusebmc-test-suites.xml.bz2
INFO:     table-defs/../results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.hybridtiger-coverage-error-call-confirmed.hybridtiger-test-suites.xml.bz2
INFO:     table-defs/../results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.kleef-coverage-error-call-confirmed.kleef-test-suites.xml.bz2
INFO:     table-defs/../results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.owic-coverage-error-call-confirmed.owic-test-suites.xml.bz2
INFO:     table-defs/../results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.tracerx-wp-coverage-error-call-confirmed.tracerx-wp-test-suites.xml.bz2
INFO:     table-defs/../results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.utestgen-coverage-error-call-confirmed.utestgen-test-suites.xml.bz2
INFO:     table-defs/../results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.wasp-c-coverage-error-call-confirmed.wasp-c-test-suites.xml.bz2
INFO: Merging results...
INFO: The resulting table will have 8 rows and 6 columns (in 2 run sets).
INFO: Generating table...
INFO: Writing CSV  into table-defs/validation.table.csv ...
INFO: done
Generated validation.csv successfully
Copying data for plots...
mkdir -p analysis/raw/
cp table-defs/portfolio-cpachecker.table.csv analysis/raw/
cp table-defs/portfolio-esbmc-kind.table.csv analysis/raw/
cp table-defs/portfolio-uautomizer.table.csv analysis/raw/
cp table-defs/portfolio-symbiotic.table.csv analysis/raw/
cp table-defs/portfolio-bubaak.table.csv analysis/raw/
cp table-defs/portfolio-cpv.table.csv analysis/raw/
cp data/reach_safety.table.csv analysis/raw/
cp results-verified/META_Cover-Error.table.csv analysis/raw/
cp table-defs/creation.table.csv analysis/raw/
cp table-defs/validation.table.csv analysis/raw/
Copied data successfully.
Creating plots...
cd analysis && python3 plot.py
RQ1: Converted tasks (done): 8
< 1.0s: 4/8 (50.00%)
< 0.2s: 0/8 (0.00%)
Created plots successfully.
Plots are in the analysis/final_plots/ directory

Inspecting Table 2 shows the updates results:

cat analysis/final_plots/Table02.tex

Expected Output:

\begin{tabular}{l S[table-format=5.0] S[table-format=5.0] S[table-format=5.0] S[table-format=3.1]}
\toprule
Tester & \multicolumn{1}{c}{\#\,Suites} & \multicolumn{1}{c}{\#\,Converted} & \multicolumn{1}{c}{\#\,Confirmed} & \multicolumn{1}{c}{\%\,Confirmed}\\
\midrule
esbmc-incr & 1 & 1 & 1 & 100.0\\
fusebmc & 1 & 1 & 0 & 0.0\\
hybridtiger & 1 & 1 & 1 & 100.0\\
kleef & 1 & 1 & 1 & 100.0\\
owic & 1 & 1 & 1 & 100.0\\
tracerx-wp & 1 & 1 & 1 & 100.0\\
utestgen & 1 & 1 & 1 & 100.0\\
wasp-c & 1 & 1 & 1 & 100.0\\
\midrule
Total & 8 & 8 & 7 & 87.5\\
\bottomrule

3.3 RQ2: Test Generators in SV-COMP

This step is empirical and uses data from previous competitions: Test-Comp 25 and SV-COMP 25. Both competitions provide respective artifacts. We include the subset necessary to run our evaluation in this artifact.

This is for reference. You do not need to download either of these. Artifact of Test-Comp 25: DOI 10.5282/ubm/data.667 (107Gb) Artifact of SV-COMP 25: DOI 10.5281/zenodo.15012085 (54Gb)

If you ran the plot generation in Step 2: This is already it! Figure07 and Figure08 already show our results from the empirical analysis.

If you did not do Step 2:
⚠️ This command will remove results you already generated into results and results-validate. If you wish to keep those results, move them somewhere outside the fase26 folder.
⚠️ Restoring the files takes about 5min. The finished size is 8.1Gb.

rm -r results results-validate
make restore-experiment-data
make -B table-defs/validation.table.csv table-defs/creation.table.csv
make plots

Expected Output:

Restoring experiment data...
Restoring from backups/experiment-data.tar.gz
8.17GiB 0:04:50 [28.8MiB/s] [                           <=>                                                                                                                            ]
Generating validation.xml...
Generated validation.xml successfully
Generating validation.csv...
PYTHONPATH=tool_info benchexec/bin/table-generator -x table-defs/validation.xml --no-diff -o table-defs -f csv
INFO: Reading table definition from 'table-defs/validation.xml'...
INFO:     table-defs/../results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.cetfuzz-coverage-error-call-confirmed.cetfuzz-test-suites.xml.bz2
INFO:     table-defs/../results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.cetfuzz-coverage-error-call-confirmed.cetfuzz-test-suites.xml.bz2
INFO:     table-defs/../results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.esbmc-incr-coverage-error-call-confirmed.esbmc-incr-test-suites.xml.bz2
INFO:     table-defs/../results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.esbmc-incr-coverage-error-call-confirmed.esbmc-incr-test-suites.xml.bz2
INFO:     table-defs/../results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.esbmc-kind-coverage-error-call-confirmed.esbmc-kind-test-suites.xml.bz2
INFO:     table-defs/../results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.esbmc-kind-coverage-error-call-confirmed.esbmc-kind-test-suites.xml.bz2
INFO:     table-defs/../results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.fdse-coverage-error-call-confirmed.fdse-test-suites.xml.bz2
INFO:     table-defs/../results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.fdse-coverage-error-call-confirmed.fdse-test-suites.xml.bz2
INFO:     table-defs/../results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.fizzer-coverage-error-call-confirmed.fizzer-test-suites.xml.bz2
INFO:     table-defs/../results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.fizzer-coverage-error-call-confirmed.fizzer-test-suites.xml.bz2
INFO:     table-defs/../results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.fusebmc-coverage-error-call-confirmed.fusebmc-test-suites.xml.bz2
INFO:     table-defs/../results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.fusebmc-coverage-error-call-confirmed.fusebmc-test-suites.xml.bz2
INFO:     table-defs/../results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.fusebmc-ia-coverage-error-call-confirmed.fusebmc-ia-test-suites.xml.bz2
INFO:     table-defs/../results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.fusebmc-ia-coverage-error-call-confirmed.fusebmc-ia-test-suites.xml.bz2
INFO:     table-defs/../results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.hybridtiger-coverage-error-call-confirmed.hybridtiger-test-suites.xml.bz2
INFO:     table-defs/../results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.hybridtiger-coverage-error-call-confirmed.hybridtiger-test-suites.xml.bz2
INFO:     table-defs/../results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.klee-coverage-error-call-confirmed.klee-test-suites.xml.bz2
INFO:     table-defs/../results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.klee-coverage-error-call-confirmed.klee-test-suites.xml.bz2
INFO:     table-defs/../results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.kleef-coverage-error-call-confirmed.kleef-test-suites.xml.bz2
INFO:     table-defs/../results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.kleef-coverage-error-call-confirmed.kleef-test-suites.xml.bz2
INFO:     table-defs/../results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.legion-coverage-error-call-confirmed.legion-test-suites.xml.bz2
INFO:     table-defs/../results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.legion-symcc-coverage-error-call-confirmed.legion-symcc-test-suites.xml.bz2
INFO:     table-defs/../results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.owic-coverage-error-call-confirmed.owic-test-suites.xml.bz2
INFO:     table-defs/../results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.prtest-coverage-error-call-confirmed.prtest-test-suites.xml.bz2
INFO:     table-defs/../results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.legion-coverage-error-call-confirmed.legion-test-suites.xml.bz2
INFO:     table-defs/../results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.legion-symcc-coverage-error-call-confirmed.legion-symcc-test-suites.xml.bz2
INFO:     table-defs/../results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.owic-coverage-error-call-confirmed.owic-test-suites.xml.bz2
INFO:     table-defs/../results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.rizzer-coverage-error-call-confirmed.rizzer-test-suites.xml.bz2
INFO:     table-defs/../results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.prtest-coverage-error-call-confirmed.prtest-test-suites.xml.bz2
INFO:     table-defs/../results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.rizzer-coverage-error-call-confirmed.rizzer-test-suites.xml.bz2
INFO:     table-defs/../results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.symbiotic-coverage-error-call-confirmed.symbiotic-test-suites.xml.bz2
INFO:     table-defs/../results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.symbiotic-coverage-error-call-confirmed.symbiotic-test-suites.xml.bz2
INFO:     table-defs/../results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.tracerx-coverage-error-call-confirmed.tracerx-test-suites.xml.bz2
INFO:     table-defs/../results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.tracerx-wp-coverage-error-call-confirmed.tracerx-wp-test-suites.xml.bz2
INFO:     table-defs/../results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.tracerx-coverage-error-call-confirmed.tracerx-test-suites.xml.bz2
INFO:     table-defs/../results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.tracerx-wp-coverage-error-call-confirmed.tracerx-wp-test-suites.xml.bz2
INFO:     table-defs/../results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.utestgen-coverage-error-call-confirmed.utestgen-test-suites.xml.bz2
INFO:     table-defs/../results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.wasp-c-coverage-error-call-confirmed.wasp-c-test-suites.xml.bz2
INFO:     table-defs/../results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.utestgen-coverage-error-call-confirmed.utestgen-test-suites.xml.bz2
INFO:     table-defs/../results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.wasp-c-coverage-error-call-confirmed.wasp-c-test-suites.xml.bz2
INFO: Merging results...
INFO: The resulting table will have 10941 rows and 6 columns (in 2 run sets).
INFO: Generating table...
INFO: Writing CSV  into table-defs/validation.table.csv ...
INFO: done
Generated validation.csv successfully
Generating creation.table.csv...
PYTHONPATH=tool_info benchexec/bin/table-generator -x table-defs/creation.xml -f csv --no-diff -o table-defs/
INFO: Reading table definition from 'table-defs/creation.xml'...
INFO:     table-defs/../results/test2witness.2025-10-16_10-29-27.results.cetfuzz-coverage-error-call-confirmed.cetfuzz-test-suites.xml.bz2
INFO:     table-defs/../results/test2witness.2025-10-16_10-29-27.results.esbmc-incr-coverage-error-call-confirmed.esbmc-incr-test-suites.xml.bz2
INFO:     table-defs/../results/test2witness.2025-10-16_10-29-27.results.esbmc-kind-coverage-error-call-confirmed.esbmc-kind-test-suites.xml.bz2
INFO:     table-defs/../results/test2witness.2025-10-16_10-29-27.results.fdse-coverage-error-call-confirmed.fdse-test-suites.xml.bz2
INFO:     table-defs/../results/test2witness.2025-10-16_10-29-27.results.fizzer-coverage-error-call-confirmed.fizzer-test-suites.xml.bz2
INFO:     table-defs/../results/test2witness.2025-10-16_10-29-27.results.fusebmc-coverage-error-call-confirmed.fusebmc-test-suites.xml.bz2
INFO:     table-defs/../results/test2witness.2025-10-16_10-29-27.results.fusebmc-ia-coverage-error-call-confirmed.fusebmc-ia-test-suites.xml.bz2
INFO:     table-defs/../results/test2witness.2025-10-16_10-29-27.results.hybridtiger-coverage-error-call-confirmed.hybridtiger-test-suites.xml.bz2
INFO:     table-defs/../results/test2witness.2025-10-16_10-29-27.results.klee-coverage-error-call-confirmed.klee-test-suites.xml.bz2
INFO:     table-defs/../results/test2witness.2025-10-16_10-29-27.results.kleef-coverage-error-call-confirmed.kleef-test-suites.xml.bz2
INFO:     table-defs/../results/test2witness.2025-10-16_10-29-27.results.legion-coverage-error-call-confirmed.legion-test-suites.xml.bz2
INFO:     table-defs/../results/test2witness.2025-10-16_10-29-27.results.legion-symcc-coverage-error-call-confirmed.legion-symcc-test-suites.xml.bz2
INFO:     table-defs/../results/test2witness.2025-10-16_10-29-27.results.owic-coverage-error-call-confirmed.owic-test-suites.xml.bz2
INFO:     table-defs/../results/test2witness.2025-10-16_10-29-27.results.prtest-coverage-error-call-confirmed.prtest-test-suites.xml.bz2
INFO:     table-defs/../results/test2witness.2025-10-16_10-29-27.results.rizzer-coverage-error-call-confirmed.rizzer-test-suites.xml.bz2
INFO:     table-defs/../results/test2witness.2025-10-16_10-29-27.results.symbiotic-coverage-error-call-confirmed.symbiotic-test-suites.xml.bz2
INFO:     table-defs/../results/test2witness.2025-10-16_10-29-27.results.tracerx-coverage-error-call-confirmed.tracerx-test-suites.xml.bz2
INFO:     table-defs/../results/test2witness.2025-10-16_10-29-27.results.tracerx-wp-coverage-error-call-confirmed.tracerx-wp-test-suites.xml.bz2
INFO:     table-defs/../results/test2witness.2025-10-16_10-29-27.results.utestgen-coverage-error-call-confirmed.utestgen-test-suites.xml.bz2
INFO:     table-defs/../results/test2witness.2025-10-16_10-29-27.results.wasp-c-coverage-error-call-confirmed.wasp-c-test-suites.xml.bz2
INFO: Merging results...
INFO: The resulting table will have 10941 rows and 3 columns (in 1 run sets).
INFO: Generating table...
INFO: Writing CSV  into table-defs/creation.table.csv ...
INFO: done
Generated creation.table.csv successfully
Copying data for plots...
mkdir -p analysis/raw/
cp table-defs/portfolio-cpachecker.table.csv analysis/raw/
cp table-defs/portfolio-esbmc-kind.table.csv analysis/raw/
cp table-defs/portfolio-uautomizer.table.csv analysis/raw/
cp table-defs/portfolio-symbiotic.table.csv analysis/raw/
cp table-defs/portfolio-bubaak.table.csv analysis/raw/
cp table-defs/portfolio-cpv.table.csv analysis/raw/
cp data/reach_safety.table.csv analysis/raw/
cp results-verified/META_Cover-Error.table.csv analysis/raw/
cp table-defs/creation.table.csv analysis/raw/
cp table-defs/validation.table.csv analysis/raw/
Copied data successfully.
Creating plots...
cd analysis && python3 plot.py
RQ1: Converted tasks (done): 10897
< 1.0s: 10490/10897 (96.27%)
< 0.2s: 8069/10897 (74.05%)
Created plots successfully.
Plots are in the analysis/final_plots/ directory

After this command completed you may compare Figures 07 and 08 to the ones in the Paper or in the folder ~/fase26/expected.

3.4 RQ3: Efficiency of Parallel Portfolio Experiments

In RQ3 we claim that while on average, the portfolio consumes more CPU time than just the verifier alone, there are examples where the portfolio consumes less. We demonstrate this on one example with uautomizer and afltc.

First, back up the portfolio results:

cd $HOME/fase26
mv results-portfolio results-portfolio.bak

Then we need to make a small adjustment such that AFL (part of the afltc tester) works:

echo core | sudo tee /proc/sys/kernel/core_pattern

When prompted for the password enter: fase26

To execute a minimal portfolio run that exercises each verifier and each test generator at least once, run:

make portfolio-rq3

This executes the ultimate automizer standalone and ultimate automizer in a portfolio with afltc on the task: sv-benchmarks/c/array-fpi/eqn1f.yml

Expected Output:

PYTHONPATH=combinations benchexec/bin/benchexec --read-only-dir=/ --hidden-dir /tmp --overlay-dir=/home -o results-portfolio/  --tool-directory combinations benchmark-defs/combinations/parallel-uautomizer-afltc-rq3.xml 
2026-01-08 15:36:23 - WARNING - Cannot determine combinations/ppexec version, error output: mkdir: cannot create directory ‘/sys/fs/cgroup/init’: Read-only file system
mkdir: cannot create directory ‘/sys/fs/cgroup/benchexec’: Read-only file system

2026-01-08 15:36:23 - WARNING - Ignoring specified resource requirements in local-execution mode, only resource limits are used.
2026-01-08 15:36:23 - INFO - Unable to find pqos_wrapper, please install it for cache allocation and monitoring if your CPU supports Intel RDT (cf. https://gitlab.com/sosy-lab/software/pqos-wrapper).

executing run set 'pp-uautomizer-afltc.eqn1f'     (1 file)
15:36:23   eqn1f.yml               false                       41.73   18.45

Statistics:              1 Files
  correct:               1
    correct true:        0
    correct false:       1
  incorrect:             0
    incorrect true:      0
    incorrect false:     0
  unknown:               0
  Score:                 1 (max: 1)

In order to get HTML and CSV tables, run
benchexec/bin/table-generator results-portfolio/parallel-uautomizer-afltc-rq3.2026-01-08_15-36-23.results.pp-uautomizer-afltc.eqn1f.xml.bz2
PYTHONPATH=combinations benchexec/bin/benchexec --read-only-dir=/ --hidden-dir /tmp --overlay-dir=/home -o results-portfolio/  --tool-directory combinations -o results-baseline/ benchmark-defs/combinations/uautomizer-rq3.xml 
2026-01-08 15:36:43 - WARNING - Cannot determine combinations/ppexec version, error output: mkdir: cannot create directory ‘/sys/fs/cgroup/init’: Read-only file system
mkdir: cannot create directory ‘/sys/fs/cgroup/benchexec’: Read-only file system

2026-01-08 15:36:43 - WARNING - Ignoring specified resource requirements in local-execution mode, only resource limits are used.
2026-01-08 15:36:43 - INFO - Unable to find pqos_wrapper, please install it for cache allocation and monitoring if your CPU supports Intel RDT (cf. https://gitlab.com/sosy-lab/software/pqos-wrapper).

executing run set 'uautomizer-reference.eqn1f'     (1 file)
15:36:43   eqn1f.yml               false                       83.05   48.93

Statistics:              1 Files
  correct:               1
    correct true:        0
    correct false:       1
  incorrect:             0
    incorrect true:      0
    incorrect false:     0
  unknown:               0
  Score:                 1 (max: 1)

In order to get HTML and CSV tables, run
benchexec/bin/table-generator results-baseline/uautomizer-rq3.2026-01-08_15-36-43.results.uautomizer-reference.eqn1f.xml.bz2

The first run shows the portfolio of uautomizer and afltc which consumes (in the example above) 41.73s of CPU time. Compare that to the second run, which is uautomizer alone and consumes 83.05s of CPU time.

3.5 RQ4: Effectiveness of Parallel Portfolio Experiments

In RQ4 we claim that the portfolios can solve tasks, that the verifier alone could not solve. To show this we select one task: sv-benchmarks/c/reducercommutativity/rangesum.yml and the portfolio of cpachecker and fusebmc.

⚠️ We assume you have moved the results-portfolio to results-portfolio.bak in Step 3.4

To run the example task execute:

make portfolio-rq4

This will run the portfolio cpachecker-fusebmc on the task first, then just cpachecker alone.
⚠️ The expected behavior is, that cpachecker alone will timeout after 900s. So this target will take a little over 16min to complete.

Expected Output:

PYTHONPATH=combinations benchexec/bin/benchexec --read-only-dir=/ --hidden-dir /tmp --overlay-dir=/home -o results-portfolio/  --tool-directory combinations benchmark-defs/combinations/parallel-cpachecker-fusebmc-rq4.xml 
2026-01-08 15:42:53 - WARNING - Cannot determine combinations/ppexec version, error output: mkdir: cannot create directory ‘/sys/fs/cgroup/init’: Read-only file system
mkdir: cannot create directory ‘/sys/fs/cgroup/benchexec’: Read-only file system

2026-01-08 15:42:53 - WARNING - Ignoring specified resource requirements in local-execution mode, only resource limits are used.
2026-01-08 15:42:53 - INFO - Unable to find pqos_wrapper, please install it for cache allocation and monitoring if your CPU supports Intel RDT (cf. https://gitlab.com/sosy-lab/software/pqos-wrapper).

executing run set 'pp-cpachecker-fusebmc.rangesum'     (1 file)
15:42:53   rangesum.yml            false                       35.76   19.78

Statistics:              1 Files
  correct:               1
    correct true:        0
    correct false:       1
  incorrect:             0
    incorrect true:      0
    incorrect false:     0
  unknown:               0
  Score:                 1 (max: 1)

In order to get HTML and CSV tables, run
benchexec/bin/table-generator results-portfolio/parallel-cpachecker-fusebmc-rq4.2026-01-08_15-42-53.results.pp-cpachecker-fusebmc.rangesum.xml.bz2
PYTHONPATH=combinations benchexec/bin/benchexec --read-only-dir=/ --hidden-dir /tmp --overlay-dir=/home -o results-portfolio/  --tool-directory combinations -o results-baseline/ benchmark-defs/combinations/cpachecker-rq4.xml 
2026-01-08 15:43:14 - WARNING - Cannot determine combinations/ppexec version, error output: mkdir: cannot create directory ‘/sys/fs/cgroup/init’: Read-only file system
mkdir: cannot create directory ‘/sys/fs/cgroup/benchexec’: Read-only file system

2026-01-08 15:43:14 - WARNING - Ignoring specified resource requirements in local-execution mode, only resource limits are used.
2026-01-08 15:43:14 - INFO - Unable to find pqos_wrapper, please install it for cache allocation and monitoring if your CPU supports Intel RDT (cf. https://gitlab.com/sosy-lab/software/pqos-wrapper).

executing run set 'cpachecker-reference.rangesum'     (1 file)
15:43:14   rangesum.yml            TIMEOUT                    902.80  869.79

Statistics:              1 Files
  correct:               0
    correct true:        0
    correct false:       0
  incorrect:             0
    incorrect true:      0
    incorrect false:     0
  unknown:               1
  Score:                 0 (max: 1)

In order to get HTML and CSV tables, run
benchexec/bin/table-generator results-baseline/cpachecker-rq4.2026-01-08_15-43-14.results.cpachecker-reference.rangesum.xml.bz2

We can see the portfolio solves the task in 35.76s CPU time, while cpachecker alone runs into a timeout.

4. Running the Full Set of Experiments

⚠️ Warning: Long-running experiments

The full experimental evaluation is computationally expensive:

Witness generation may take 10–20 minutes inside the VM.
Witness validation may take several hours up to a full day.
The parallel portfolio experiments will take multiple days to weeks on a 4-core setup.

Proceed only if you are prepared for these runtimes.

4.1 RQ1: Full Witness Generation

Follow the setup from 3.1. Then instead of the small- run the full experiment.

cd $HOME/fase26
make witnesses-from-testsuites

4.2 RQ1: Full Witness Validation

Follow the setup from 3.2. Then instead of the small- run the full experiment.

make validate-generated-witnesses

⚠️ This benchmark tries to match the entire sv-benchmarks suite against the existing witnesses. This leads to a lot of messages of the form:

2026-01-08 20:45:21 - WARNING - No files found matching '../results/test2witness.2025-10-16_10-29-27.files/${rundefinition_name}/${taskdef_name}/output/witness.yml'.
2026-01-08 20:45:21 - WARNING - Pattern ../results/test2witness.2025-10-16_10-29-27.files/${rundefinition_name}/${taskdef_name}/output/witness.yml in requiredfiles tag did not match any file for task sv-benchmarks/c/xcsp/aim-200-6-0-sat-4.yml.

This is normal and expected.

4.3 RQ2: Test Generators in SV-COMP

The steps are identical to 3.3. Nothing new is to be done.

4.4 RQ3 & RQ4: Full Parallel Portfolio Evaluation

Follow the setup from 3.4. Then instead of just our examples run the full experiment.

⚠️ Warning: Even a single combination will take several hours to days to complete on a 4 core VM. ### Evaluated Tools

Verifiers	Test Generators
CPAchecker	prtest
UAutomizer	fusebmc
ESBMC-kind	fizzer
Symbiotic	afltc
Bubaak
CPV

Each combination corresponds to a Make target of the form <verifier>-<test-generator>.

To run a specific combination, for example CPAchecker with prtest, execute:

make cpachecker-prtest

To run all 24 evaluated, execute:

make par-all

4.5 Recreating tables and plots

To completely recreate the plots and their required data (after the experiments have run): ⚠️ Note: This will run a python script that will permanently update the table-defs/portfolio-*.xml to include the latest run portfolios. We recommend creating a backup of the table-defs cp -r table-defs table-defs.bak.

make -B plots

Files

LICENSE.txt

Files (35.7 GB)

Name	Size	Download all
LICENSE.txt md5:c846ebb396f8b174b10ded4771514fcc	10.3 kB	Preview Download
README.md md5:3c1acf1aa96a894ab1559fafb41cc7d8	50.5 kB	Preview Download
REQUIREMENTS.md md5:0d3653893d8c5442ed69325833c969d8	213 Bytes	Preview Download
STATUS.md md5:e80e1b7154684d2e687e096104a4db0b	853 Bytes	Preview Download
Test-to-Witness.ova md5:734d444cdddb758e819ed5fb9a66b355	35.7 GB	Download

	All versions	This version
Views	109	98
Downloads	103	103
Data volume	1.1 TB	1.1 TB

Artifact for the FASE 26 Paper: Testing in Formal Verification via Witness Generation (Empirical Evaluation)

Authors/Creators

Description

Artifact for the FASE 26 Paper

Testing in Formal Verification via Witness Generation (Empirical Evaluation)

Getting Started

1. Download and Installation

Instructions

Inside the VM

2. Recreating the Plots and Tables of the Paper

3. Running a Subset of the Experiments

Backup of Results

3.1 RQ1: Creating Witnesses from Test Suites

3.2 RQ1: Validating the Generated Witnesses

3.3 RQ2: Test Generators in SV-COMP

3.4 RQ3: Efficiency of Parallel Portfolio Experiments

3.5 RQ4: Effectiveness of Parallel Portfolio Experiments

4. Running the Full Set of Experiments

4.1 RQ1: Full Witness Generation

4.2 RQ1: Full Witness Validation

4.3 RQ2: Test Generators in SV-COMP

4.4 RQ3 & RQ4: Full Parallel Portfolio Evaluation

4.5 Recreating tables and plots

Files

LICENSE.txt

Files (35.7 GB)