Published November 26, 2025 | Version v1
Dataset Open

The AInception Dataset

Authors/Creators

Description

The AInception dataset contains system, network, and cyber-physical logs generated from three simulated military cyber-defence storylines: SL100, SL300, and SL700. These were produced within the European Defence Fund project AInception (GA 101103385) and model realistic operational environments involving benign behaviour, adversarial activity, and multi-step attack chains.

Simulation of storylines

In AInception, a military scenario with six connected storylines has been developed. These are described in this report. Three of these storylines are used in this dataset:

  • SL100 — UAV border surveillance: In SL100, national borders are monitored with unmanned aerial vehicles. The red actor initially compromises the ICT systems of the facility that controls the drones for intelligence purposes. This access is later used to crash a drone as a political signal. The included SL100 dataset/simulation contains benign UAV patrol missions followed by compromise of a Windows operator machine and an attack leading to UAV mission interruption. Included are system logs, Suricata alerts/netflows, UAV flight logs, and a STIX-based infrastructure graph.

  • SL300 — Non-combatant evacuation operation:  In SL300, the evacuation of diplomatic and government personnel is conducted through a land-based non-combatant evacuation operation. A battlegroup is deployed and sends armoured vehicles to escort buses with personnel to an airport for air evacuation. Compromised vehicle systems are leveraged to gain intelligence and disrupt the operation. The included SL300 dataset/simulation contains a Windows-based infrastructure, Active Directory, C2 systems, email servers, and simulated vehicles communicating with the HQ over satellite. The dataset includes eight multi-day simulations featuring benign operational behaviour and multiple attack variants affecting initial access, lateral movement, and service disruption. Logs include Windows Event/Sysmon, Linux audit logs, Suricata NetFlow, simulated user actions, and attack tool telemetry.

  • SL700 — Battlegroup at home base:  In SL700, the garrison where the battlegroup of SL300 has its home base, is compromised before deployment. A cyber attack against physical access and surveillance systems is used to enable physical access for the red actor. This access is leveraged in SL300. The included SL700 dataset/simulation contains surveillance systems, firewalls, routers, and specialised infrastructure. Includes simulations of reconnaissance, exploitation, privilege escalation, persistence techniques, and manipulation of firewall rules to disrupt CCTV video feeds. Each run provides raw host logs, network data, AttackMate timelines, and labelled subsets.

Content

Across the three storylines, the dataset includes:

  • Host logs (Windows Event Logs, Sysmon, Linux audit logs, application logs)

  • Network telemetry (Suricata alerts, NetFlow, PCAP fragments)

  • UAV flight and mission logs (SL100)

  • Simulated user activity traces (SL300)

  • Structured attack timelines with MITRE ATT&CK mapping

  • Infrastructure descriptions in STIX 2.1 graph format (SL100, SL700)

  • Indicators of Compromise (IOCs) and STIX objects (SL300)

  • Labelled/annotated malicious vs benign events (where available)

  • Alerts
  • Alert graphs (SL300; variant 2 and variant 5)
  • Knowledge graph (SL300; variant 5)
  • Attack-defence graphs in MAL (SL300)

Scale

The dataset spans 15 complete simulations, each a variant of one of the above storylines. Individual runs range from hours (SL700) to six days (SL300). Total logs include tens to hundreds of millions of events, depending on the storyline.

Purpose

This dataset is intended for cybersecurity research, particularly for the military domain, including:

  • anomaly detection

  • intrusion detection

  • cyber-physical system security

  • graph-based threat analysis

  • behavioural modelling and concept drift research

  • alert analysis and triage
  • situational awareness
  • response generation

It provides realistic, diverse, and high-fidelity datasets aligned with operational military scenarios.

A PDF file is included with additional details on the dataset and the underlying simulation. Each variant/simulation (represented as a separate ZIP file) contains a README file with further information.

Files

The AInception datasets.pdf

Files (133.6 GB)

Name Size Download all
md5:3d7841d3eaf6d4e297cc75aa9a33fe3d
31.6 MB Preview Download
md5:f786920d5d0f25b87ec874f52c5a9f09
15.1 GB Preview Download
md5:5e3f48ea5082ce050ea2b6fd45c250b6
21.1 GB Preview Download
md5:6c1b8ae1fc7397ef0546eeb36f840835
12.7 GB Preview Download
md5:4930200a752c0a06e064325e6b7aca3a
13.7 GB Preview Download
md5:83a52d0eb98049c4f411159d92894bd3
11.3 GB Preview Download
md5:13a2fcba6a4690ca514278ce6ad2d4cf
9.7 GB Preview Download
md5:482204971906b47e6b96449a8616f36a
13.5 GB Preview Download
md5:8b7a7eea81ff55db146c3a25ee5efe9c
14.2 GB Preview Download
md5:90fbb9c4c94ba839bbdb84fc0d058a8c
3.9 GB Preview Download
md5:af983fcf543629166bc840944d9f5870
3.7 GB Preview Download
md5:d95b3e5cd5c29536c75f6f8f91f39f60
3.9 GB Preview Download
md5:e81e2819cbb1ccda76fda080293385be
3.6 GB Preview Download
md5:4a14133c837cd26b540be5d65a04ed6f
3.7 GB Preview Download
md5:8852767f1bc07b4f308483c448a31849
3.4 GB Preview Download
md5:ef597587efb3c870ad8f3182cf6293be
792.2 kB Preview Download