There is a newer version of the record available.

Published March 2, 2026 | Version v1
Dataset Open

Raw DIMACS Formulas Derived from Kconfig Models for Research on Highly Configurable Software

  • 1. Universidad Nacional de Educacion a Distancia
  • 2. Universidad Nacional de Educación a Distancia (UNED)
  • 3. Universidad Nacional de Educación a Distancia

Description

A Dataset of Raw DIMACS Formulas Derived from Kconfig Models

This repository contains 5,476 raw DIMACS-encoded Boolean formulas derived from Kconfig variability models of nine open-source systems. Each model is translated by two state-of-the-art tools, KconfigReader and KMax, yielding two DIMACS files per model (2,738 models x 2 translators = 5,476 files).

All formulas are provided without any preprocessing (no backbone simplification, no atomic-set reduction, etc.). This makes the dataset suitable for benchmarking preprocessing algorithms themselves, as well as for evaluating how raw formula complexity affects downstream reasoning tasks such as SAT solving, model counting, and configuration sampling.

Systems

System Description
axTLS Embedded TLS library
Buildroot Embedded Linux build system
BusyBox Unix utilities for embedded systems
EmbToolkit Embedded systems toolkit
Freetz-NG Fritz!Box firmware modification
L4Re L4 Runtime Environment microkernel
Linux kernel 43 architectures (Alpha, Arc, ARM, ARM64, i386, x86_64, ...)
Toybox Command-line utilities
uClibc Embedded C library

The dataset covers multiple tagged releases per system. Formulas range from 22 to 73,719 variables and from 18 to 7,406,667 clauses.

Repository Structure

.
├── dimacs/             # 5,476 DIMACS files
│   ├── Axtls__1-0-0__KConfigReader.dimacs
│   ├── Axtls__1-0-0__KMax.dimacs
│   ├── ...
│   └── LinuxX8664__6-5__KMax.dimacs
├── summary.csv         # Per-formula metrics
├── replication.sh      # Bash script to regenerate the dataset
└── README.md

File Naming Convention

Each DIMACS file follows the pattern:

{System}__{Version}__{Translator}.dimacs
  • System: system name, with architecture appended for Linux (e.g., LinuxArm64)
  • Version: release tag with dots replaced by hyphens (e.g., 6-5 for version 6.5)
  • TranslatorKConfigReader or KMax

Example: LinuxArm64__6-5__KMax.dimacs

summary.csv

A CSV file with one row per DIMACS file and the following columns:

Column Description
Name File name
NumberOfVariables Number of Boolean variables
NumberOfClauses Number of clauses
NumberOfCoreFeatures Variables that must be true in every satisfying assignment
NumberOfDeadFeatures Variables that must be false in every satisfying assignment
MedianLiteralsPerClause Median number of literals per clause
Inf95CovIntLitsPerClause Lower bound of the 95% coverage interval for literals per clause
Sup95CovIntLitsPerClause Upper bound of the 95% coverage interval for literals per clause
Version System version (with dots)
ToolToGetTheFormula Translator used (KConfigReader or KMax)

DIMACS Format

Each file encodes a Boolean formula in Conjunctive Normal Form (CNF). The format is:

  • Lines starting with c are comments mapping variable numbers to Kconfig option names.
  • The line p cnf <variables> <clauses> declares the number of variables and clauses.
  • Each subsequent line is a clause: a space-separated list of non-zero integers terminated by 0. Positive integers denote affirmed literals; negative integers denote negations.

Replication

The replication.sh script regenerates all DIMACS files from scratch using torte. It requires Docker or Podman:

bash replication.sh

torte clones each system's source repository at every tagged version, runs KconfigReader and KMax inside containers, transforms the output to SMT-LIB 2 format with FeatJAR, and converts it to DIMACS using the Z3 solver.

Authors

Ruben Heradio, Cristina Cerrada, Ismael Abad, Ernesto Aranda-Escolastico, Juan Jose Escribano, David Fernandez-Amoros

Universidad Nacional de Educacion a Distancia (UNED), Madrid, Spain

Acknowledgements

We thank Elias Kuiter for his help and support in using torte.

This work has been funded by FEDER/Spanish Ministry of Science, Innovation and Universities (MCIN)/Agencia Estatal de Investigacion (AEI) under project COSY (PID2022-142043NB-I00).

License

MIT License.

Files

dataset.zip

Files (4.8 GB)

Name Size Download all
md5:740cd3ffc6cb0a9ebc3cafac80e5e75d
4.8 GB Preview Download