The AInception Dataset
Authors/Creators
Description
The AInception dataset contains system, network, and cyber-physical logs generated from three simulated military cyber-defence storylines: SL100, SL300, and SL700. These were produced within the European Defence Fund project AInception (GA 101103385) and model realistic operational environments involving benign behaviour, adversarial activity, and multi-step attack chains.
Simulation of storylines
In AInception, a military scenario with six connected storylines has been developed. These are described in this report. Three of these storylines are used in this dataset:
-
SL100 — UAV border surveillance: In SL100, national borders are monitored with unmanned aerial vehicles. The red actor initially compromises the ICT systems of the facility that controls the drones for intelligence purposes. This access is later used to crash a drone as a political signal. The included SL100 dataset/simulation contains benign UAV patrol missions followed by compromise of a Windows operator machine and an attack leading to UAV mission interruption. Included are system logs, Suricata alerts/netflows, UAV flight logs, and a STIX-based infrastructure graph.
-
SL300 — Non-combatant evacuation operation: In SL300, the evacuation of diplomatic and government personnel is conducted through a land-based non-combatant evacuation operation. A battlegroup is deployed and sends armoured vehicles to escort buses with personnel to an airport for air evacuation. Compromised vehicle systems are leveraged to gain intelligence and disrupt the operation. The included SL300 dataset/simulation contains a Windows-based infrastructure, Active Directory, C2 systems, email servers, and simulated vehicles communicating with the HQ over satellite. The dataset includes eight multi-day simulations featuring benign operational behaviour and multiple attack variants affecting initial access, lateral movement, and service disruption. Logs include Windows Event/Sysmon, Linux audit logs, Suricata NetFlow, simulated user actions, and attack tool telemetry.
-
SL700 — Battlegroup at home base: In SL700, the garrison where the battlegroup of SL300 has its home base, is compromised before deployment. A cyber attack against physical access and surveillance systems is used to enable physical access for the red actor. This access is leveraged in SL300. The included SL700 dataset/simulation contains surveillance systems, firewalls, routers, and specialised infrastructure. Includes simulations of reconnaissance, exploitation, privilege escalation, persistence techniques, and manipulation of firewall rules to disrupt CCTV video feeds. Each run provides raw host logs, network data, AttackMate timelines, and labelled subsets.
Content
Across the three storylines, the dataset includes:
-
Host logs (Windows Event Logs, Sysmon, Linux audit logs, application logs)
-
Network telemetry (Suricata alerts, NetFlow, PCAP fragments)
-
UAV flight and mission logs (SL100)
-
Simulated user activity traces (SL300)
-
Structured attack timelines with MITRE ATT&CK mapping
-
Infrastructure descriptions in STIX 2.1 graph format (SL100, SL700)
-
Indicators of Compromise (IOCs) and STIX objects (SL300)
-
Labelled/annotated malicious vs benign events (where available)
- Alerts
- Alert graphs (SL300; variant 2 and variant 5)
- Knowledge graph (SL300; variant 5)
- Attack-defence graphs in MAL (SL300)
Scale
The dataset spans 15 complete simulations, each a variant of one of the above storylines. Individual runs range from hours (SL700) to six days (SL300). Total logs include tens to hundreds of millions of events, depending on the storyline.
Purpose
This dataset is intended for cybersecurity research, particularly for the military domain, including:
-
anomaly detection
-
intrusion detection
-
cyber-physical system security
-
graph-based threat analysis
-
behavioural modelling and concept drift research
- alert analysis and triage
- situational awareness
- response generation
It provides realistic, diverse, and high-fidelity datasets aligned with operational military scenarios.
A PDF file is included with additional details on the dataset and the underlying simulation. Each variant/simulation (represented as a separate ZIP file) contains a README file with further information.
Files
The AInception datasets.pdf
Files
(133.6 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:3d7841d3eaf6d4e297cc75aa9a33fe3d
|
31.6 MB | Preview Download |
|
md5:f786920d5d0f25b87ec874f52c5a9f09
|
15.1 GB | Preview Download |
|
md5:5e3f48ea5082ce050ea2b6fd45c250b6
|
21.1 GB | Preview Download |
|
md5:6c1b8ae1fc7397ef0546eeb36f840835
|
12.7 GB | Preview Download |
|
md5:4930200a752c0a06e064325e6b7aca3a
|
13.7 GB | Preview Download |
|
md5:83a52d0eb98049c4f411159d92894bd3
|
11.3 GB | Preview Download |
|
md5:13a2fcba6a4690ca514278ce6ad2d4cf
|
9.7 GB | Preview Download |
|
md5:482204971906b47e6b96449a8616f36a
|
13.5 GB | Preview Download |
|
md5:8b7a7eea81ff55db146c3a25ee5efe9c
|
14.2 GB | Preview Download |
|
md5:90fbb9c4c94ba839bbdb84fc0d058a8c
|
3.9 GB | Preview Download |
|
md5:af983fcf543629166bc840944d9f5870
|
3.7 GB | Preview Download |
|
md5:d95b3e5cd5c29536c75f6f8f91f39f60
|
3.9 GB | Preview Download |
|
md5:e81e2819cbb1ccda76fda080293385be
|
3.6 GB | Preview Download |
|
md5:4a14133c837cd26b540be5d65a04ed6f
|
3.7 GB | Preview Download |
|
md5:8852767f1bc07b4f308483c448a31849
|
3.4 GB | Preview Download |
|
md5:ef597587efb3c870ad8f3182cf6293be
|
792.2 kB | Preview Download |