Data4Cyber: A Labeled Cyber-Physical Dataset for Distribution-Grid Management Systems Under Representative OT Attacks
Authors/Creators
Description
Data4Cyber is a multi-modal, scenario-based dataset for cyber-physical security
research on distribution-grid management systems with distributed energy
resources (DER). The release contains seven primary scenarios (S0--S6) covering
a benign baseline, two Industroyer-style Modbus manipulations against PV and
BSS, three ARP/MITM false-data-injection variants on meter telemetry, and one
MQTT supply-chain compromise of the price signal. One alternate scenario
(S1_industroyer_pv_alt) is shipped as additional reference material outside the
primary analysis set.
Each scenario folder contains a synchronized 1 Hz process telemetry table
(dataset.csv with 151 to 158 columns), benign-only and attack-only splits, an
IPAL-compatible state log (state.jsonl.gz), full and split OT packet captures
(pcapng), attack-phase annotations, IP/MAC-to-role and Modbus-register
semantic mappings, and per-scenario plots. Labels include a binary
attack_active indicator, a single-label attack_phase token, and a multi-label
attack_phase_all field for overlapping phases. The aggregate row count is
14,354 (5,880 benign / 8,474 attack), collected on 2026-03-05 and 2026-03-06.
The dataset supports anomaly detection, intrusion detection, phase-aware
sequence labeling, and cross-layer cause-effect analysis. Baseline IDS results
across twelve IPAL detector implementations are reported in the companion
publication. The release is standalone: all files required for interpretation
and reproduction are inside the archive.
Files
data4cyber_dataset.zip
Files
(134.0 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:a540979c63120c9a0295ff974933580f
|
134.0 MB | Preview Download |
Additional details
Dates
- Collected
-
2026-03-05/2026-03-06