Raw DIMACS Formulas Derived from Kconfig Models for Research on Highly Configurable Software
Authors/Creators
- 1. Universidad Nacional de Educacion a Distancia
- 2. Universidad Nacional de Educación a Distancia (UNED)
- 3. Universidad Nacional de Educación a Distancia
Description
A Dataset of Raw DIMACS Formulas Derived from Kconfig Models
This repository contains 5,476 raw DIMACS-encoded Boolean formulas derived from Kconfig variability models of nine open-source systems. Each model is translated by two state-of-the-art tools, KconfigReader and KMax, yielding two DIMACS files per model (2,738 models x 2 translators = 5,476 files).
All formulas are provided without any preprocessing (no backbone simplification, no atomic-set reduction, etc.). This makes the dataset suitable for benchmarking preprocessing algorithms themselves, as well as for evaluating how raw formula complexity affects downstream reasoning tasks such as SAT solving, model counting, and configuration sampling.
Systems
| System | Description |
|---|---|
| axTLS | Embedded TLS library |
| Buildroot | Embedded Linux build system |
| BusyBox | Unix utilities for embedded systems |
| EmbToolkit | Embedded systems toolkit |
| Freetz-NG | Fritz!Box firmware modification |
| L4Re | L4 Runtime Environment microkernel |
| Linux kernel | 43 architectures (Alpha, Arc, ARM, ARM64, i386, x86_64, ...) |
| Toybox | Command-line utilities |
| uClibc | Embedded C library |
The dataset covers multiple tagged releases per system. Formulas range from 22 to 73,719 variables and from 18 to 7,406,667 clauses.
Repository Structure
.
├── dimacs/ # 5,476 DIMACS files
│ ├── Axtls__1-0-0__KConfigReader.dimacs
│ ├── Axtls__1-0-0__KMax.dimacs
│ ├── ...
│ └── LinuxX8664__6-5__KMax.dimacs
├── summary.csv # Per-formula metrics
├── replication.sh # Bash script to regenerate the dataset
└── README.md
File Naming Convention
Each DIMACS file follows the pattern:
{System}__{Version}__{Translator}.dimacs
- System: system name, with architecture appended for Linux (e.g.,
LinuxArm64) - Version: release tag with dots replaced by hyphens (e.g.,
6-5for version 6.5) - Translator:
KConfigReaderorKMax
Example: LinuxArm64__6-5__KMax.dimacs
summary.csv
A CSV file with one row per DIMACS file and the following columns:
| Column | Description |
|---|---|
Name |
File name |
NumberOfVariables |
Number of Boolean variables |
NumberOfClauses |
Number of clauses |
NumberOfCoreFeatures |
Variables that must be true in every satisfying assignment |
NumberOfDeadFeatures |
Variables that must be false in every satisfying assignment |
MedianLiteralsPerClause |
Median number of literals per clause |
Inf95CovIntLitsPerClause |
Lower bound of the 95% coverage interval for literals per clause |
Sup95CovIntLitsPerClause |
Upper bound of the 95% coverage interval for literals per clause |
Version |
System version (with dots) |
ToolToGetTheFormula |
Translator used (KConfigReader or KMax) |
DIMACS Format
Each file encodes a Boolean formula in Conjunctive Normal Form (CNF). The format is:
- Lines starting with
care comments mapping variable numbers to Kconfig option names. - The line
p cnf <variables> <clauses>declares the number of variables and clauses. - Each subsequent line is a clause: a space-separated list of non-zero integers terminated by
0. Positive integers denote affirmed literals; negative integers denote negations.
Replication
The replication.sh script regenerates all DIMACS files from scratch using torte. It requires Docker or Podman:
bash replication.sh
torte clones each system's source repository at every tagged version, runs KconfigReader and KMax inside containers, transforms the output to SMT-LIB 2 format with FeatJAR, and converts it to DIMACS using the Z3 solver.
Authors
Ruben Heradio, Cristina Cerrada, Ismael Abad, Ernesto Aranda-Escolastico, Juan Jose Escribano, David Fernandez-Amoros
Universidad Nacional de Educacion a Distancia (UNED), Madrid, Spain
Acknowledgements
We thank Elias Kuiter for his help and support in using torte.
This work has been funded by FEDER/Spanish Ministry of Science, Innovation and Universities (MCIN)/Agencia Estatal de Investigacion (AEI) under project COSY (PID2022-142043NB-I00).
License
MIT License.
Files
dataset.zip
Files
(4.8 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:740cd3ffc6cb0a9ebc3cafac80e5e75d
|
4.8 GB | Preview Download |