Artifact for the FASE 26 Paper: Testing in Formal Verification via Witness Generation (Empirical Evaluation)
Authors/Creators
Description
Artifact for the FASE 26 Paper
Testing in Formal Verification via Witness Generation (Empirical Evaluation)
This artifact guides you through reproducing the plots and tables presented in the paper and through rerunning the experimental evaluation.
The artifact is shipped as a VirtualBox VM. - The artifact is contained within a Ubuntu 24.04 VM. - We require at least 16 GB of Memory and 4 Cores
Getting Started
Username and Password
user: test-to-witness
password: fase26
This artifact contains:
- analysis/ : Files concerning plots
- backup/ : A backup of the experiment data
- benchexec/ : The benchmarking tool
- benchmark-defs/ : Files defining benchmarks
- benchmark-sets/ : Files describing sets of tasks
- combinations/ : Code of the portfolio
- data/ : data from SV-COMP 25
- expected/ : The expected plots
- Makefile
- results/ : Here are the generated witnesses from test-to-witness
- results-baseline/ : results of the baseline (just the verifier) for the portfolios
- results-portfolio/ : results of the portfolio experiments
- results-validate/ : results of the validation of generated witnesses
- results-verified/ : Data from Test-Comp 25
- sv-benchmarks/ : The sv-benchmark tasks
- table-defs/ : Files defining how to create csv tables from the raw experiment data
- test-to-witness/ : The tool test-to-witness
- tool_info/ : Files telling benchexec how to run our tools
- validators/ : The validators used in the experiment
If you are interested in just the tool Test-To-Witness you may navigate into the test-to-witness folder. There we include a README on how to use test-to-witness. Please note: We did not install a golang distribution in the VM, thus it is not possible to recompile test-to-witness. For the most up-to-date version we refer readers to our public gitlab: https://gitlab.com/sosy-lab/software/test-to-witness.
This artifact document consists of the following steps:
- Downloading and installing the VM
- Recreating the plots and tables from the paper
- Running a subset of the experiments
- Running the full set of experiments
For Getting Started we recommend doing Sections 1, 2 and 3.1 these should be able to be completed in less than 30min.
Sections 2, 3 and 4 also serve as the Step-by-Step guide.
1. Download and Installation
Instructions
-
Download the artifact VM from Zenodo using the DOI provided with the paper. The download consists of a single VirtualBox appliance file with the extension
.ova. -
Install Oracle VirtualBox (version 7 or newer is recommended) from the official website.
-
Start VirtualBox and select File → Import Appliance.
-
Choose the downloaded
.ovafile and follow the import wizard. You may adjust the VM settings during import; we require allocating at least 16 GB of RAM and 4 CPU cores. -
After the import finishes, select the VM from the list and click Start to launch it.
-
Log in to the VM using the credentials: user: test-to-witness pw: fase26
Inside the VM
To keep the VM as small as possible to Download, we compressed the data inside the VM. Before starting the evaluation please decompress the data. This will take approximately 15-30min. The resulting size is approximately 49Gb.
pigz -dc fase26.tar.gz | pv | tar xf -
2. Recreating the Plots and Tables of the Paper
To recreate the plots, first open a terminal and change into the directory containing all data and scripts:
cd $HOME/fase26
To generate all plots and tables used in the paper, run the following command (this takes approximately 1–2 minutes):
make plots
Expected Output:
Copying data for plots...
mkdir -p analysis/raw/
cp table-defs/portfolio-cpachecker.table.csv analysis/raw/
cp table-defs/portfolio-esbmc-kind.table.csv analysis/raw/
cp table-defs/portfolio-uautomizer.table.csv analysis/raw/
cp table-defs/portfolio-symbiotic.table.csv analysis/raw/
cp table-defs/portfolio-bubaak.table.csv analysis/raw/
cp table-defs/portfolio-cpv.table.csv analysis/raw/
cp data/reach_safety.table.csv analysis/raw/
cp results-verified/META_Cover-Error.table.csv analysis/raw/
cp table-defs/creation.table.csv analysis/raw/
cp table-defs/validation.table.csv analysis/raw/
Copied data successfully.
Creating plots...
cd analysis && python3 plot.py
RQ1: Converted tasks (done): 10897
< 1.0s: 10490/10897 (96.27%)
< 0.2s: 8069/10897 (74.05%)
Created plots successfully.
Plots are in the analysis/final_plots/ directory
The generated plots are located in analysis/final_plots/.
For convenience, we also provide the expected plots and tables in the directory expected/ so that you can directly compare results.
3. Running a Subset of the Experiments
Because the full experiments are time-consuming, we provide a smoke-test mode that executes a small but representative subset of the experiments.
There are three experiments in the paper:
-
Creating witnesses from test suites
-
Validating the generated witnesses
-
Running parallel verifier–test-generator portfolios
Backup of Results
The experiments will write to the same files that the original experiments conducted by the authors wrote to. The benchmarking utility benchexec will terminate with an error if one of the old experiemt files is still present. In the following steps, we manually move result directories to make way for the new set of experiments about to be run. An additional backup of all original results is stored in backups/backup.tar.gz. It may be restored with make restore-experiment-data
3.1 RQ1: Creating Witnesses from Test Suites
To generate witnesses from test cases, execute:
cd $HOME/fase26
mv results results.bak
make small-witnesses-from-testsuites
Expected Output:
PYTHONPATH=tool_info benchexec/bin/benchexec --read-only-dir=/ --hidden-dir /tmp --overlay-dir=/home --no-compress-results -o results/ --tool-directory test-to-witness --startTime '2025-10-16 10:29:27' benchmark-defs/small/test2witness.xml
2026-01-08 14:09:59 - WARNING - No propertyfile specified. Score computation will ignore the results.
2026-01-08 14:09:59 - INFO - Unable to find pqos_wrapper, please install it for cache allocation and monitoring if your CPU supports Intel RDT (cf. https://gitlab.com/sosy-lab/software/pqos-wrapper).
executing run set 'esbmc-incr-coverage-error-call-confirmed.esbmc-incr-test-suites' (1 file)
14:09:59 06_fuzzle_50x50_0-cycle.yml done 2.30 2.67
executing run set 'fusebmc-coverage-error-call-confirmed.fusebmc-test-suites' (1 file)
14:10:02 array_ptr_single_elem_init-2.yml done 5.64 3.54
executing run set 'hybridtiger-coverage-error-call-confirmed.hybridtiger-test-suites' (1 file)
14:10:06 btor2c-lazyMod.driving_phils.3.prop1-func-interl.yml done 1.33 1.72
executing run set 'kleef-coverage-error-call-confirmed.kleef-test-suites' (1 file)
14:10:08 s4liff.yml done 0.26 0.49
executing run set 'owic-coverage-error-call-confirmed.owic-test-suites' (1 file)
14:10:08 elevator_spec3_product23.cil.yml done 0.33 0.57
executing run set 'tracerx-wp-coverage-error-call-confirmed.tracerx-wp-test-suites' (1 file)
14:10:09 list-2.yml done 175.25 167.49
executing run set 'utestgen-coverage-error-call-confirmed.utestgen-test-suites' (1 file)
14:12:57 rangesum.yml done 0.26 0.44
executing run set 'wasp-c-coverage-error-call-confirmed.wasp-c-test-suites' (1 file)
14:12:58 brs5f.yml done 0.21 0.70
Statistics: 8 Files
correct: 0
correct true: 0
correct false: 0
incorrect: 0
incorrect true: 0
incorrect false: 0
unknown: 0
In order to get HTML and CSV tables, run
benchexec/bin/table-generator results/test2witness.2025-10-16_10-29-27.results.esbmc-incr-coverage-error-call-confirmed.esbmc-incr-test-suites.xml results/test2witness.2025-10-16_10-29-27.results.fusebmc-coverage-error-call-confirmed.fusebmc-test-suites.xml results/test2witness.2025-10-16_10-29-27.results.hybridtiger-coverage-error-call-confirmed.hybridtiger-test-suites.xml results/test2witness.2025-10-16_10-29-27.results.kleef-coverage-error-call-confirmed.kleef-test-suites.xml results/test2witness.2025-10-16_10-29-27.results.owic-coverage-error-call-confirmed.owic-test-suites.xml results/test2witness.2025-10-16_10-29-27.results.tracerx-wp-coverage-error-call-confirmed.tracerx-wp-test-suites.xml results/test2witness.2025-10-16_10-29-27.results.utestgen-coverage-error-call-confirmed.utestgen-test-suites.xml results/test2witness.2025-10-16_10-29-27.results.wasp-c-coverage-error-call-confirmed.wasp-c-test-suites.xml
This command generates witnesses from test cases produced by 8 different test case generators and representing the different runtimes a conversion can take on.
We recreate the creation.table.csv used by the plot script with our newly created data:
make -B table-defs/creation.table.csv
Expected Output:
Generating creation.table.csv...
benchexec/bin/table-generator -x table-defs/creation.xml -f csv --no-diff -o table-defs/
INFO: Reading table definition from 'table-defs/creation.xml'...
WARNING: No file matches 'table-defs/../results/test2witness.2025-10-16_10-29-27.results.cetfuzz-coverage-error-call-confirmed.cetfuzz-test-suites.xml.bz2'.
INFO: table-defs/../results/test2witness.2025-10-16_10-29-27.results.esbmc-incr-coverage-error-call-confirmed.esbmc-incr-test-suites.xml.bz2
WARNING: No file matches 'table-defs/../results/test2witness.2025-10-16_10-29-27.results.esbmc-kind-coverage-error-call-confirmed.esbmc-kind-test-suites.xml.bz2'.
WARNING: No file matches 'table-defs/../results/test2witness.2025-10-16_10-29-27.results.fdse-coverage-error-call-confirmed.fdse-test-suites.xml.bz2'.
WARNING: No file matches 'table-defs/../results/test2witness.2025-10-16_10-29-27.results.fizzer-coverage-error-call-confirmed.fizzer-test-suites.xml.bz2'.
INFO: table-defs/../results/test2witness.2025-10-16_10-29-27.results.fusebmc-coverage-error-call-confirmed.fusebmc-test-suites.xml.bz2
WARNING: No file matches 'table-defs/../results/test2witness.2025-10-16_10-29-27.results.fusebmc-ia-coverage-error-call-confirmed.fusebmc-ia-test-suites.xml.bz2'.
INFO: table-defs/../results/test2witness.2025-10-16_10-29-27.results.hybridtiger-coverage-error-call-confirmed.hybridtiger-test-suites.xml.bz2
WARNING: No file matches 'table-defs/../results/test2witness.2025-10-16_10-29-27.results.klee-coverage-error-call-confirmed.klee-test-suites.xml.bz2'.
INFO: table-defs/../results/test2witness.2025-10-16_10-29-27.results.kleef-coverage-error-call-confirmed.kleef-test-suites.xml.bz2
WARNING: No file matches 'table-defs/../results/test2witness.2025-10-16_10-29-27.results.legion-coverage-error-call-confirmed.legion-test-suites.xml.bz2'.
WARNING: No file matches 'table-defs/../results/test2witness.2025-10-16_10-29-27.results.legion-symcc-coverage-error-call-confirmed.legion-symcc-test-suites.xml.bz2'.
INFO: table-defs/../results/test2witness.2025-10-16_10-29-27.results.owic-coverage-error-call-confirmed.owic-test-suites.xml.bz2
WARNING: No file matches 'table-defs/../results/test2witness.2025-10-16_10-29-27.results.prtest-coverage-error-call-confirmed.prtest-test-suites.xml.bz2'.
WARNING: No file matches 'table-defs/../results/test2witness.2025-10-16_10-29-27.results.rizzer-coverage-error-call-confirmed.rizzer-test-suites.xml.bz2'.
WARNING: No file matches 'table-defs/../results/test2witness.2025-10-16_10-29-27.results.symbiotic-coverage-error-call-confirmed.symbiotic-test-suites.xml.bz2'.
WARNING: No file matches 'table-defs/../results/test2witness.2025-10-16_10-29-27.results.tracerx-coverage-error-call-confirmed.tracerx-test-suites.xml.bz2'.
INFO: table-defs/../results/test2witness.2025-10-16_10-29-27.results.tracerx-wp-coverage-error-call-confirmed.tracerx-wp-test-suites.xml.bz2
INFO: table-defs/../results/test2witness.2025-10-16_10-29-27.results.utestgen-coverage-error-call-confirmed.utestgen-test-suites.xml.bz2
INFO: table-defs/../results/test2witness.2025-10-16_10-29-27.results.wasp-c-coverage-error-call-confirmed.wasp-c-test-suites.xml.bz2
WARNING: Missing module ".testtowitness", cannot extract values from log files (ImportError: No module named '.testtowitness').
INFO: Merging results...
INFO: The resulting table will have 8 rows and 3 columns (in 1 run sets).
INFO: Generating table...
INFO: Writing CSV into table-defs/creation.table.csv ...
INFO: done
Generated creation.table.csv successfully
After the command finishes, we run the plot generation agian:
make plots
Expected Output:
Copying data for plots...
mkdir -p analysis/raw/
cp table-defs/portfolio-cpachecker.table.csv analysis/raw/
cp table-defs/portfolio-esbmc-kind.table.csv analysis/raw/
cp table-defs/portfolio-uautomizer.table.csv analysis/raw/
cp table-defs/portfolio-symbiotic.table.csv analysis/raw/
cp table-defs/portfolio-bubaak.table.csv analysis/raw/
cp table-defs/portfolio-cpv.table.csv analysis/raw/
cp data/reach_safety.table.csv analysis/raw/
cp results-verified/META_Cover-Error.table.csv analysis/raw/
cp table-defs/creation.table.csv analysis/raw/
cp table-defs/validation.table.csv analysis/raw/
Copied data successfully.
Creating plots...
cd analysis && python3 plot.py
RQ1: Converted tasks (done): 8
< 1.0s: 4/8 (50.00%)
< 0.2s: 0/8 (0.00%)
Created plots successfully.
Plots are in the analysis/final_plots/ directory
Taking a look at analysis/final_plots/Figure06.pdf now shows an updated version of the plot using the few data points that were just created.
open analysis/final_plots/Figure06.pdf
3.2 RQ1: Validating the Generated Witnesses
If you completed Step 3.1, you have two options:
- Continue with the witnesses you just generated.
- Restore and use the witnesses generated by the authors.
If you choose option 2, restore the original results. Warning: This will delete the current results directory.
[ -d results.bak ] && rm -r results && mv results.bak results
To validate witnesses using both validators employed in the paper (CPAchecker and Witch), run:
mv results-validate results-validate.bak
make small-validate-generated-witnesses
⚠️ This command will run approximately 10-15min depending on your system.
Expected Output:
benchexec/bin/benchexec --read-only-dir=/ --hidden-dir /tmp --overlay-dir=/home -o results-validate/ --startTime '2025-08-05 23:18:21' --tool-directory validators/witch benchmark-defs/small/witch-validate-violation-witnesses-2.0.xml
2026-01-08 14:41:34 - WARNING - No propertyfile specified. Score computation will ignore the results.
2026-01-08 14:41:35 - WARNING - Ignoring specified resource requirements in local-execution mode, only resource limits are used.
2026-01-08 14:41:35 - INFO - Unable to find pqos_wrapper, please install it for cache allocation and monitoring if your CPU supports Intel RDT (cf. https://gitlab.com/sosy-lab/software/pqos-wrapper).
executing run set 'esbmc-incr-coverage-error-call-confirmed.esbmc-incr-test-suites' (1 file)
14:41:35 06_fuzzle_50x50_0-cycle.yml false(unreach-call) 24.89 24.96
executing run set 'fusebmc-coverage-error-call-confirmed.fusebmc-test-suites' (1 file)
14:42:00 array_ptr_single_elem_init-2.yml TIMEOUT (validation preprocessing) 90.81 91.04
executing run set 'hybridtiger-coverage-error-call-confirmed.hybridtiger-test-suites' (1 file)
14:43:32 btor2c-lazyMod.driving_phils.3.prop1-func-interl.yml false(unreach-call) 16.85 16.88
executing run set 'kleef-coverage-error-call-confirmed.kleef-test-suites' (1 file)
14:43:49 s4liff.yml false(unreach-call) 1.79 1.80
executing run set 'owic-coverage-error-call-confirmed.owic-test-suites' (1 file)
14:43:51 elevator_spec3_product23.cil.yml false(unreach-call) 2.09 2.10
executing run set 'tracerx-wp-coverage-error-call-confirmed.tracerx-wp-test-suites' (1 file)
14:43:53 list-2.yml false(unreach-call) 36.12 36.17
executing run set 'utestgen-coverage-error-call-confirmed.utestgen-test-suites' (1 file)
14:44:30 rangesum.yml false(unreach-call) 2.06 2.07
executing run set 'wasp-c-coverage-error-call-confirmed.wasp-c-test-suites' (1 file)
14:44:32 brs5f.yml false(unreach-call) 2.16 2.21
Statistics: 8 Files
correct: 0
correct true: 0
correct false: 0
incorrect: 0
incorrect true: 0
incorrect false: 0
unknown: 1
In order to get HTML and CSV tables, run
benchexec/bin/table-generator results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.esbmc-incr-coverage-error-call-confirmed.esbmc-incr-test-suites.xml.bz2 results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.fusebmc-coverage-error-call-confirmed.fusebmc-test-suites.xml.bz2 results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.hybridtiger-coverage-error-call-confirmed.hybridtiger-test-suites.xml.bz2 results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.kleef-coverage-error-call-confirmed.kleef-test-suites.xml.bz2 results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.owic-coverage-error-call-confirmed.owic-test-suites.xml.bz2 results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.tracerx-wp-coverage-error-call-confirmed.tracerx-wp-test-suites.xml.bz2 results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.utestgen-coverage-error-call-confirmed.utestgen-test-suites.xml.bz2 results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.wasp-c-coverage-error-call-confirmed.wasp-c-test-suites.xml.bz2
benchexec/bin/benchexec --read-only-dir=/ --hidden-dir /tmp --overlay-dir=/home -o results-validate/ --startTime '2025-08-05 23:17:34' --tool-directory validators/cpachecker-4.0 benchmark-defs/small/cpachecker-validate-violation-witnesses-2.0.xml
2026-01-08 14:44:35 - WARNING - No propertyfile specified. Score computation will ignore the results.
2026-01-08 14:44:36 - WARNING - Ignoring specified resource requirements in local-execution mode, only resource limits are used.
2026-01-08 14:44:36 - INFO - Unable to find pqos_wrapper, please install it for cache allocation and monitoring if your CPU supports Intel RDT (cf. https://gitlab.com/sosy-lab/software/pqos-wrapper).
executing run set 'esbmc-incr-coverage-error-call-confirmed.esbmc-incr-test-suites' (1 file)
14:44:36 06_fuzzle_50x50_0-cycle.yml TIMEOUT 97.30 73.34
executing run set 'fusebmc-coverage-error-call-confirmed.fusebmc-test-suites' (1 file)
14:45:50 array_ptr_single_elem_init-2.yml TIMEOUT 96.14 59.47
executing run set 'hybridtiger-coverage-error-call-confirmed.hybridtiger-test-suites' (1 file)
14:46:50 btor2c-lazyMod.driving_phils.3.prop1-func-interl.yml true 20.28 11.52
executing run set 'kleef-coverage-error-call-confirmed.kleef-test-suites' (1 file)
14:47:01 s4liff.yml true 9.48 5.71
executing run set 'owic-coverage-error-call-confirmed.owic-test-suites' (1 file)
14:47:07 elevator_spec3_product23.cil.yml true 15.90 9.06
executing run set 'tracerx-wp-coverage-error-call-confirmed.tracerx-wp-test-suites' (1 file)
14:47:17 list-2.yml true 14.92 8.70
executing run set 'utestgen-coverage-error-call-confirmed.utestgen-test-suites' (1 file)
14:47:26 rangesum.yml true 10.30 6.17
executing run set 'wasp-c-coverage-error-call-confirmed.wasp-c-test-suites' (1 file)
14:47:32 brs5f.yml true 9.28 5.64
Statistics: 8 Files
correct: 0
correct true: 0
correct false: 0
incorrect: 0
incorrect true: 0
incorrect false: 0
unknown: 2
In order to get HTML and CSV tables, run
benchexec/bin/table-generator results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.esbmc-incr-coverage-error-call-confirmed.esbmc-incr-test-suites.xml.bz2 results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.fusebmc-coverage-error-call-confirmed.fusebmc-test-suites.xml.bz2 results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.hybridtiger-coverage-error-call-confirmed.hybridtiger-test-suites.xml.bz2 results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.kleef-coverage-error-call-confirmed.kleef-test-suites.xml.bz2 results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.owic-coverage-error-call-confirmed.owic-test-suites.xml.bz2 results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.tracerx-wp-coverage-error-call-confirmed.tracerx-wp-test-suites.xml.bz2 results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.utestgen-coverage-error-call-confirmed.utestgen-test-suites.xml.bz2 results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.wasp-c-coverage-error-call-confirmed.wasp-c-test-suites.xml.bz2
This validates witnesses generated from the sample set of test suites. We can see based on the output where the validators confirm the result: if the line of the task says false(unreach-call) then it was confirmed. Our subset demonstrated one case, array_ptr_single_elem_init-2.yml where both validators run into a timeout. In Table 2 of the paper this task would contribute to the ~15% of unconfirmed witnesses of fusebmc.
We can recreate the plots by first updating the validation.table.csv and then recreating the plots.
make -B table-defs/validation.table.csv
make plots
Expected Output:
Generating validation.xml...
Generated validation.xml successfully
Generating validation.csv...
benchexec/bin/table-generator -x table-defs/validation.xml --no-diff -o table-defs -f csv
INFO: Reading table definition from 'table-defs/validation.xml'...
INFO: table-defs/../results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.esbmc-incr-coverage-error-call-confirmed.esbmc-incr-test-suites.xml.bz2
INFO: table-defs/../results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.esbmc-incr-coverage-error-call-confirmed.esbmc-incr-test-suites.xml.bz2
INFO: table-defs/../results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.fusebmc-coverage-error-call-confirmed.fusebmc-test-suites.xml.bz2
INFO: table-defs/../results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.hybridtiger-coverage-error-call-confirmed.hybridtiger-test-suites.xml.bz2
INFO: table-defs/../results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.kleef-coverage-error-call-confirmed.kleef-test-suites.xml.bz2
INFO: table-defs/../results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.owic-coverage-error-call-confirmed.owic-test-suites.xml.bz2
INFO: table-defs/../results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.tracerx-wp-coverage-error-call-confirmed.tracerx-wp-test-suites.xml.bz2
INFO: table-defs/../results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.utestgen-coverage-error-call-confirmed.utestgen-test-suites.xml.bz2
INFO: table-defs/../results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.wasp-c-coverage-error-call-confirmed.wasp-c-test-suites.xml.bz2
INFO: table-defs/../results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.fusebmc-coverage-error-call-confirmed.fusebmc-test-suites.xml.bz2
INFO: table-defs/../results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.hybridtiger-coverage-error-call-confirmed.hybridtiger-test-suites.xml.bz2
INFO: table-defs/../results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.kleef-coverage-error-call-confirmed.kleef-test-suites.xml.bz2
INFO: table-defs/../results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.owic-coverage-error-call-confirmed.owic-test-suites.xml.bz2
INFO: table-defs/../results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.tracerx-wp-coverage-error-call-confirmed.tracerx-wp-test-suites.xml.bz2
INFO: table-defs/../results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.utestgen-coverage-error-call-confirmed.utestgen-test-suites.xml.bz2
INFO: table-defs/../results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.wasp-c-coverage-error-call-confirmed.wasp-c-test-suites.xml.bz2
INFO: Merging results...
INFO: The resulting table will have 8 rows and 6 columns (in 2 run sets).
INFO: Generating table...
INFO: Writing CSV into table-defs/validation.table.csv ...
INFO: done
Generated validation.csv successfully
Copying data for plots...
mkdir -p analysis/raw/
cp table-defs/portfolio-cpachecker.table.csv analysis/raw/
cp table-defs/portfolio-esbmc-kind.table.csv analysis/raw/
cp table-defs/portfolio-uautomizer.table.csv analysis/raw/
cp table-defs/portfolio-symbiotic.table.csv analysis/raw/
cp table-defs/portfolio-bubaak.table.csv analysis/raw/
cp table-defs/portfolio-cpv.table.csv analysis/raw/
cp data/reach_safety.table.csv analysis/raw/
cp results-verified/META_Cover-Error.table.csv analysis/raw/
cp table-defs/creation.table.csv analysis/raw/
cp table-defs/validation.table.csv analysis/raw/
Copied data successfully.
Creating plots...
cd analysis && python3 plot.py
RQ1: Converted tasks (done): 8
< 1.0s: 4/8 (50.00%)
< 0.2s: 0/8 (0.00%)
Created plots successfully.
Plots are in the analysis/final_plots/ directory
Inspecting Table 2 shows the updates results:
cat analysis/final_plots/Table02.tex
Expected Output:
\begin{tabular}{l S[table-format=5.0] S[table-format=5.0] S[table-format=5.0] S[table-format=3.1]}
\toprule
Tester & \multicolumn{1}{c}{\#\,Suites} & \multicolumn{1}{c}{\#\,Converted} & \multicolumn{1}{c}{\#\,Confirmed} & \multicolumn{1}{c}{\%\,Confirmed}\\
\midrule
esbmc-incr & 1 & 1 & 1 & 100.0\\
fusebmc & 1 & 1 & 0 & 0.0\\
hybridtiger & 1 & 1 & 1 & 100.0\\
kleef & 1 & 1 & 1 & 100.0\\
owic & 1 & 1 & 1 & 100.0\\
tracerx-wp & 1 & 1 & 1 & 100.0\\
utestgen & 1 & 1 & 1 & 100.0\\
wasp-c & 1 & 1 & 1 & 100.0\\
\midrule
Total & 8 & 8 & 7 & 87.5\\
\bottomrule
3.3 RQ2: Test Generators in SV-COMP
This step is empirical and uses data from previous competitions: Test-Comp 25 and SV-COMP 25. Both competitions provide respective artifacts. We include the subset necessary to run our evaluation in this artifact.
This is for reference. You do not need to download either of these. Artifact of Test-Comp 25: DOI 10.5282/ubm/data.667 (107Gb) Artifact of SV-COMP 25: DOI 10.5281/zenodo.15012085 (54Gb)
If you ran the plot generation in Step 2: This is already it! Figure07 and Figure08 already show our results from the empirical analysis.
If you did not do Step 2:
⚠️ This command will remove results you already generated into results and results-validate. If you wish to keep those results, move them somewhere outside the fase26 folder.
⚠️ Restoring the files takes about 5min. The finished size is 8.1Gb.
rm -r results results-validate
make restore-experiment-data
make -B table-defs/validation.table.csv table-defs/creation.table.csv
make plots
Expected Output:
Restoring experiment data...
Restoring from backups/experiment-data.tar.gz
8.17GiB 0:04:50 [28.8MiB/s] [ <=> ]
Generating validation.xml...
Generated validation.xml successfully
Generating validation.csv...
PYTHONPATH=tool_info benchexec/bin/table-generator -x table-defs/validation.xml --no-diff -o table-defs -f csv
INFO: Reading table definition from 'table-defs/validation.xml'...
INFO: table-defs/../results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.cetfuzz-coverage-error-call-confirmed.cetfuzz-test-suites.xml.bz2
INFO: table-defs/../results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.cetfuzz-coverage-error-call-confirmed.cetfuzz-test-suites.xml.bz2
INFO: table-defs/../results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.esbmc-incr-coverage-error-call-confirmed.esbmc-incr-test-suites.xml.bz2
INFO: table-defs/../results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.esbmc-incr-coverage-error-call-confirmed.esbmc-incr-test-suites.xml.bz2
INFO: table-defs/../results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.esbmc-kind-coverage-error-call-confirmed.esbmc-kind-test-suites.xml.bz2
INFO: table-defs/../results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.esbmc-kind-coverage-error-call-confirmed.esbmc-kind-test-suites.xml.bz2
INFO: table-defs/../results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.fdse-coverage-error-call-confirmed.fdse-test-suites.xml.bz2
INFO: table-defs/../results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.fdse-coverage-error-call-confirmed.fdse-test-suites.xml.bz2
INFO: table-defs/../results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.fizzer-coverage-error-call-confirmed.fizzer-test-suites.xml.bz2
INFO: table-defs/../results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.fizzer-coverage-error-call-confirmed.fizzer-test-suites.xml.bz2
INFO: table-defs/../results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.fusebmc-coverage-error-call-confirmed.fusebmc-test-suites.xml.bz2
INFO: table-defs/../results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.fusebmc-coverage-error-call-confirmed.fusebmc-test-suites.xml.bz2
INFO: table-defs/../results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.fusebmc-ia-coverage-error-call-confirmed.fusebmc-ia-test-suites.xml.bz2
INFO: table-defs/../results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.fusebmc-ia-coverage-error-call-confirmed.fusebmc-ia-test-suites.xml.bz2
INFO: table-defs/../results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.hybridtiger-coverage-error-call-confirmed.hybridtiger-test-suites.xml.bz2
INFO: table-defs/../results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.hybridtiger-coverage-error-call-confirmed.hybridtiger-test-suites.xml.bz2
INFO: table-defs/../results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.klee-coverage-error-call-confirmed.klee-test-suites.xml.bz2
INFO: table-defs/../results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.klee-coverage-error-call-confirmed.klee-test-suites.xml.bz2
INFO: table-defs/../results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.kleef-coverage-error-call-confirmed.kleef-test-suites.xml.bz2
INFO: table-defs/../results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.kleef-coverage-error-call-confirmed.kleef-test-suites.xml.bz2
INFO: table-defs/../results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.legion-coverage-error-call-confirmed.legion-test-suites.xml.bz2
INFO: table-defs/../results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.legion-symcc-coverage-error-call-confirmed.legion-symcc-test-suites.xml.bz2
INFO: table-defs/../results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.owic-coverage-error-call-confirmed.owic-test-suites.xml.bz2
INFO: table-defs/../results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.prtest-coverage-error-call-confirmed.prtest-test-suites.xml.bz2
INFO: table-defs/../results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.legion-coverage-error-call-confirmed.legion-test-suites.xml.bz2
INFO: table-defs/../results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.legion-symcc-coverage-error-call-confirmed.legion-symcc-test-suites.xml.bz2
INFO: table-defs/../results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.owic-coverage-error-call-confirmed.owic-test-suites.xml.bz2
INFO: table-defs/../results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.rizzer-coverage-error-call-confirmed.rizzer-test-suites.xml.bz2
INFO: table-defs/../results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.prtest-coverage-error-call-confirmed.prtest-test-suites.xml.bz2
INFO: table-defs/../results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.rizzer-coverage-error-call-confirmed.rizzer-test-suites.xml.bz2
INFO: table-defs/../results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.symbiotic-coverage-error-call-confirmed.symbiotic-test-suites.xml.bz2
INFO: table-defs/../results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.symbiotic-coverage-error-call-confirmed.symbiotic-test-suites.xml.bz2
INFO: table-defs/../results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.tracerx-coverage-error-call-confirmed.tracerx-test-suites.xml.bz2
INFO: table-defs/../results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.tracerx-wp-coverage-error-call-confirmed.tracerx-wp-test-suites.xml.bz2
INFO: table-defs/../results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.tracerx-coverage-error-call-confirmed.tracerx-test-suites.xml.bz2
INFO: table-defs/../results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.tracerx-wp-coverage-error-call-confirmed.tracerx-wp-test-suites.xml.bz2
INFO: table-defs/../results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.utestgen-coverage-error-call-confirmed.utestgen-test-suites.xml.bz2
INFO: table-defs/../results-validate/witch-validate-violation-witnesses-2.0.2025-08-05_23-18-21.results.wasp-c-coverage-error-call-confirmed.wasp-c-test-suites.xml.bz2
INFO: table-defs/../results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.utestgen-coverage-error-call-confirmed.utestgen-test-suites.xml.bz2
INFO: table-defs/../results-validate/cpachecker-validate-violation-witnesses-2.0.2025-08-05_23-17-34.results.wasp-c-coverage-error-call-confirmed.wasp-c-test-suites.xml.bz2
INFO: Merging results...
INFO: The resulting table will have 10941 rows and 6 columns (in 2 run sets).
INFO: Generating table...
INFO: Writing CSV into table-defs/validation.table.csv ...
INFO: done
Generated validation.csv successfully
Generating creation.table.csv...
PYTHONPATH=tool_info benchexec/bin/table-generator -x table-defs/creation.xml -f csv --no-diff -o table-defs/
INFO: Reading table definition from 'table-defs/creation.xml'...
INFO: table-defs/../results/test2witness.2025-10-16_10-29-27.results.cetfuzz-coverage-error-call-confirmed.cetfuzz-test-suites.xml.bz2
INFO: table-defs/../results/test2witness.2025-10-16_10-29-27.results.esbmc-incr-coverage-error-call-confirmed.esbmc-incr-test-suites.xml.bz2
INFO: table-defs/../results/test2witness.2025-10-16_10-29-27.results.esbmc-kind-coverage-error-call-confirmed.esbmc-kind-test-suites.xml.bz2
INFO: table-defs/../results/test2witness.2025-10-16_10-29-27.results.fdse-coverage-error-call-confirmed.fdse-test-suites.xml.bz2
INFO: table-defs/../results/test2witness.2025-10-16_10-29-27.results.fizzer-coverage-error-call-confirmed.fizzer-test-suites.xml.bz2
INFO: table-defs/../results/test2witness.2025-10-16_10-29-27.results.fusebmc-coverage-error-call-confirmed.fusebmc-test-suites.xml.bz2
INFO: table-defs/../results/test2witness.2025-10-16_10-29-27.results.fusebmc-ia-coverage-error-call-confirmed.fusebmc-ia-test-suites.xml.bz2
INFO: table-defs/../results/test2witness.2025-10-16_10-29-27.results.hybridtiger-coverage-error-call-confirmed.hybridtiger-test-suites.xml.bz2
INFO: table-defs/../results/test2witness.2025-10-16_10-29-27.results.klee-coverage-error-call-confirmed.klee-test-suites.xml.bz2
INFO: table-defs/../results/test2witness.2025-10-16_10-29-27.results.kleef-coverage-error-call-confirmed.kleef-test-suites.xml.bz2
INFO: table-defs/../results/test2witness.2025-10-16_10-29-27.results.legion-coverage-error-call-confirmed.legion-test-suites.xml.bz2
INFO: table-defs/../results/test2witness.2025-10-16_10-29-27.results.legion-symcc-coverage-error-call-confirmed.legion-symcc-test-suites.xml.bz2
INFO: table-defs/../results/test2witness.2025-10-16_10-29-27.results.owic-coverage-error-call-confirmed.owic-test-suites.xml.bz2
INFO: table-defs/../results/test2witness.2025-10-16_10-29-27.results.prtest-coverage-error-call-confirmed.prtest-test-suites.xml.bz2
INFO: table-defs/../results/test2witness.2025-10-16_10-29-27.results.rizzer-coverage-error-call-confirmed.rizzer-test-suites.xml.bz2
INFO: table-defs/../results/test2witness.2025-10-16_10-29-27.results.symbiotic-coverage-error-call-confirmed.symbiotic-test-suites.xml.bz2
INFO: table-defs/../results/test2witness.2025-10-16_10-29-27.results.tracerx-coverage-error-call-confirmed.tracerx-test-suites.xml.bz2
INFO: table-defs/../results/test2witness.2025-10-16_10-29-27.results.tracerx-wp-coverage-error-call-confirmed.tracerx-wp-test-suites.xml.bz2
INFO: table-defs/../results/test2witness.2025-10-16_10-29-27.results.utestgen-coverage-error-call-confirmed.utestgen-test-suites.xml.bz2
INFO: table-defs/../results/test2witness.2025-10-16_10-29-27.results.wasp-c-coverage-error-call-confirmed.wasp-c-test-suites.xml.bz2
INFO: Merging results...
INFO: The resulting table will have 10941 rows and 3 columns (in 1 run sets).
INFO: Generating table...
INFO: Writing CSV into table-defs/creation.table.csv ...
INFO: done
Generated creation.table.csv successfully
Copying data for plots...
mkdir -p analysis/raw/
cp table-defs/portfolio-cpachecker.table.csv analysis/raw/
cp table-defs/portfolio-esbmc-kind.table.csv analysis/raw/
cp table-defs/portfolio-uautomizer.table.csv analysis/raw/
cp table-defs/portfolio-symbiotic.table.csv analysis/raw/
cp table-defs/portfolio-bubaak.table.csv analysis/raw/
cp table-defs/portfolio-cpv.table.csv analysis/raw/
cp data/reach_safety.table.csv analysis/raw/
cp results-verified/META_Cover-Error.table.csv analysis/raw/
cp table-defs/creation.table.csv analysis/raw/
cp table-defs/validation.table.csv analysis/raw/
Copied data successfully.
Creating plots...
cd analysis && python3 plot.py
RQ1: Converted tasks (done): 10897
< 1.0s: 10490/10897 (96.27%)
< 0.2s: 8069/10897 (74.05%)
Created plots successfully.
Plots are in the analysis/final_plots/ directory
After this command completed you may compare Figures 07 and 08 to the ones in the Paper or in the folder ~/fase26/expected.
3.4 RQ3: Efficiency of Parallel Portfolio Experiments
In RQ3 we claim that while on average, the portfolio consumes more CPU time than just the verifier alone, there are examples where the portfolio consumes less. We demonstrate this on one example with uautomizer and afltc.
First, back up the portfolio results:
cd $HOME/fase26
mv results-portfolio results-portfolio.bak
Then we need to make a small adjustment such that AFL (part of the afltc tester) works:
echo core | sudo tee /proc/sys/kernel/core_pattern
When prompted for the password enter: fase26
To execute a minimal portfolio run that exercises each verifier and each test generator at least once, run:
make portfolio-rq3
This executes the ultimate automizer standalone and ultimate automizer in a portfolio with afltc on the task: sv-benchmarks/c/array-fpi/eqn1f.yml
Expected Output:
PYTHONPATH=combinations benchexec/bin/benchexec --read-only-dir=/ --hidden-dir /tmp --overlay-dir=/home -o results-portfolio/ --tool-directory combinations benchmark-defs/combinations/parallel-uautomizer-afltc-rq3.xml
2026-01-08 15:36:23 - WARNING - Cannot determine combinations/ppexec version, error output: mkdir: cannot create directory ‘/sys/fs/cgroup/init’: Read-only file system
mkdir: cannot create directory ‘/sys/fs/cgroup/benchexec’: Read-only file system
2026-01-08 15:36:23 - WARNING - Ignoring specified resource requirements in local-execution mode, only resource limits are used.
2026-01-08 15:36:23 - INFO - Unable to find pqos_wrapper, please install it for cache allocation and monitoring if your CPU supports Intel RDT (cf. https://gitlab.com/sosy-lab/software/pqos-wrapper).
executing run set 'pp-uautomizer-afltc.eqn1f' (1 file)
15:36:23 eqn1f.yml false 41.73 18.45
Statistics: 1 Files
correct: 1
correct true: 0
correct false: 1
incorrect: 0
incorrect true: 0
incorrect false: 0
unknown: 0
Score: 1 (max: 1)
In order to get HTML and CSV tables, run
benchexec/bin/table-generator results-portfolio/parallel-uautomizer-afltc-rq3.2026-01-08_15-36-23.results.pp-uautomizer-afltc.eqn1f.xml.bz2
PYTHONPATH=combinations benchexec/bin/benchexec --read-only-dir=/ --hidden-dir /tmp --overlay-dir=/home -o results-portfolio/ --tool-directory combinations -o results-baseline/ benchmark-defs/combinations/uautomizer-rq3.xml
2026-01-08 15:36:43 - WARNING - Cannot determine combinations/ppexec version, error output: mkdir: cannot create directory ‘/sys/fs/cgroup/init’: Read-only file system
mkdir: cannot create directory ‘/sys/fs/cgroup/benchexec’: Read-only file system
2026-01-08 15:36:43 - WARNING - Ignoring specified resource requirements in local-execution mode, only resource limits are used.
2026-01-08 15:36:43 - INFO - Unable to find pqos_wrapper, please install it for cache allocation and monitoring if your CPU supports Intel RDT (cf. https://gitlab.com/sosy-lab/software/pqos-wrapper).
executing run set 'uautomizer-reference.eqn1f' (1 file)
15:36:43 eqn1f.yml false 83.05 48.93
Statistics: 1 Files
correct: 1
correct true: 0
correct false: 1
incorrect: 0
incorrect true: 0
incorrect false: 0
unknown: 0
Score: 1 (max: 1)
In order to get HTML and CSV tables, run
benchexec/bin/table-generator results-baseline/uautomizer-rq3.2026-01-08_15-36-43.results.uautomizer-reference.eqn1f.xml.bz2
The first run shows the portfolio of uautomizer and afltc which consumes (in the example above) 41.73s of CPU time. Compare that to the second run, which is uautomizer alone and consumes 83.05s of CPU time.
3.5 RQ4: Effectiveness of Parallel Portfolio Experiments
In RQ4 we claim that the portfolios can solve tasks, that the verifier alone could not solve. To show this we select one task: sv-benchmarks/c/reducercommutativity/rangesum.yml and the portfolio of cpachecker and fusebmc.
⚠️ We assume you have moved the results-portfolio to results-portfolio.bak in Step 3.4
To run the example task execute:
make portfolio-rq4
This will run the portfolio cpachecker-fusebmc on the task first, then just cpachecker alone.
⚠️ The expected behavior is, that cpachecker alone will timeout after 900s. So this target will take a little over 16min to complete.
Expected Output:
PYTHONPATH=combinations benchexec/bin/benchexec --read-only-dir=/ --hidden-dir /tmp --overlay-dir=/home -o results-portfolio/ --tool-directory combinations benchmark-defs/combinations/parallel-cpachecker-fusebmc-rq4.xml
2026-01-08 15:42:53 - WARNING - Cannot determine combinations/ppexec version, error output: mkdir: cannot create directory ‘/sys/fs/cgroup/init’: Read-only file system
mkdir: cannot create directory ‘/sys/fs/cgroup/benchexec’: Read-only file system
2026-01-08 15:42:53 - WARNING - Ignoring specified resource requirements in local-execution mode, only resource limits are used.
2026-01-08 15:42:53 - INFO - Unable to find pqos_wrapper, please install it for cache allocation and monitoring if your CPU supports Intel RDT (cf. https://gitlab.com/sosy-lab/software/pqos-wrapper).
executing run set 'pp-cpachecker-fusebmc.rangesum' (1 file)
15:42:53 rangesum.yml false 35.76 19.78
Statistics: 1 Files
correct: 1
correct true: 0
correct false: 1
incorrect: 0
incorrect true: 0
incorrect false: 0
unknown: 0
Score: 1 (max: 1)
In order to get HTML and CSV tables, run
benchexec/bin/table-generator results-portfolio/parallel-cpachecker-fusebmc-rq4.2026-01-08_15-42-53.results.pp-cpachecker-fusebmc.rangesum.xml.bz2
PYTHONPATH=combinations benchexec/bin/benchexec --read-only-dir=/ --hidden-dir /tmp --overlay-dir=/home -o results-portfolio/ --tool-directory combinations -o results-baseline/ benchmark-defs/combinations/cpachecker-rq4.xml
2026-01-08 15:43:14 - WARNING - Cannot determine combinations/ppexec version, error output: mkdir: cannot create directory ‘/sys/fs/cgroup/init’: Read-only file system
mkdir: cannot create directory ‘/sys/fs/cgroup/benchexec’: Read-only file system
2026-01-08 15:43:14 - WARNING - Ignoring specified resource requirements in local-execution mode, only resource limits are used.
2026-01-08 15:43:14 - INFO - Unable to find pqos_wrapper, please install it for cache allocation and monitoring if your CPU supports Intel RDT (cf. https://gitlab.com/sosy-lab/software/pqos-wrapper).
executing run set 'cpachecker-reference.rangesum' (1 file)
15:43:14 rangesum.yml TIMEOUT 902.80 869.79
Statistics: 1 Files
correct: 0
correct true: 0
correct false: 0
incorrect: 0
incorrect true: 0
incorrect false: 0
unknown: 1
Score: 0 (max: 1)
In order to get HTML and CSV tables, run
benchexec/bin/table-generator results-baseline/cpachecker-rq4.2026-01-08_15-43-14.results.cpachecker-reference.rangesum.xml.bz2
We can see the portfolio solves the task in 35.76s CPU time, while cpachecker alone runs into a timeout.
4. Running the Full Set of Experiments
⚠️ Warning: Long-running experiments
The full experimental evaluation is computationally expensive:
-
Witness generation may take 10–20 minutes inside the VM.
-
Witness validation may take several hours up to a full day.
-
The parallel portfolio experiments will take multiple days to weeks on a 4-core setup.
Proceed only if you are prepared for these runtimes.
4.1 RQ1: Full Witness Generation
Follow the setup from 3.1. Then instead of the small- run the full experiment.
cd $HOME/fase26
make witnesses-from-testsuites
4.2 RQ1: Full Witness Validation
Follow the setup from 3.2. Then instead of the small- run the full experiment.
make validate-generated-witnesses
⚠️ This benchmark tries to match the entire sv-benchmarks suite against the existing witnesses. This leads to a lot of messages of the form:
2026-01-08 20:45:21 - WARNING - No files found matching '../results/test2witness.2025-10-16_10-29-27.files/${rundefinition_name}/${taskdef_name}/output/witness.yml'.
2026-01-08 20:45:21 - WARNING - Pattern ../results/test2witness.2025-10-16_10-29-27.files/${rundefinition_name}/${taskdef_name}/output/witness.yml in requiredfiles tag did not match any file for task sv-benchmarks/c/xcsp/aim-200-6-0-sat-4.yml.
This is normal and expected.
4.3 RQ2: Test Generators in SV-COMP
The steps are identical to 3.3. Nothing new is to be done.
4.4 RQ3 & RQ4: Full Parallel Portfolio Evaluation
Follow the setup from 3.4. Then instead of just our examples run the full experiment.
⚠️ Warning: Even a single combination will take several hours to days to complete on a 4 core VM. ### Evaluated Tools
| Verifiers | Test Generators |
|---|---|
| CPAchecker | prtest |
| UAutomizer | fusebmc |
| ESBMC-kind | fizzer |
| Symbiotic | afltc |
| Bubaak | |
| CPV |
Each combination corresponds to a Make target of the form <verifier>-<test-generator>.
To run a specific combination, for example CPAchecker with prtest, execute:
make cpachecker-prtest
To run all 24 evaluated, execute:
make par-all
4.5 Recreating tables and plots
To completely recreate the plots and their required data (after the experiments have run): ⚠️ Note: This will run a python script that will permanently update the table-defs/portfolio-*.xml to include the latest run portfolios. We recommend creating a backup of the table-defs cp -r table-defs table-defs.bak.
make -B plots
Files
LICENSE.txt
Files
(35.7 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:c846ebb396f8b174b10ded4771514fcc
|
10.3 kB | Preview Download |
|
md5:3c1acf1aa96a894ab1559fafb41cc7d8
|
50.5 kB | Preview Download |
|
md5:0d3653893d8c5442ed69325833c969d8
|
213 Bytes | Preview Download |
|
md5:e80e1b7154684d2e687e096104a4db0b
|
853 Bytes | Preview Download |
|
md5:734d444cdddb758e819ed5fb9a66b355
|
35.7 GB | Download |