Published August 18, 2023 | Version 1
Dataset Open

AIT Alert Data Set

  • 1. AIT Austrian Institute of Technology

Description

This repository contains the AIT Alert Data Set (AIT-ADS), a collection of synthetic alerts suitable for evaluation of alert aggregation, alert correlation, alert filtering, and attack graph generation approaches. The alerts were forensically generated from the AIT Log Data Set V2 (AIT-LDSv2) and origin from three intrusion detection systems, namely Suricata, Wazuh, and AMiner. The data sets comprise eight scenarios, each of which has been targeted by a multi-step attack with attack steps such as scans, web application exploits, password cracking, remote command execution, privilege escalation, etc. Each scenario and attack chain has certain variations so that attack manifestations and resulting alert sequences vary in each scenario; this means that the data set allows to develop and evaluate approaches that compute similarities of attack chains or merge them into meta-alerts. Since only few benchmark alert data sets are publicly available, the AIT-ADS was developed to address common issues in the research domain of multi-step attack analysis; specifically, the alert data set contains many false positives caused by normal user behavior (e.g., user login attempts or software updates), heterogeneous alert formats (although all alerts are in JSON format, their fields are different for each IDS), repeated executions of attacks according to an attack plan, collection of alerts from diverse log sources (application logs and network traffic) and all components in the network (mail server, web server, DNS, firewall, file share, etc.), and labels for attack phases. For more information on how this alert data set was generated, check out our paper accompanying this data set [1] or our GitHub repository. More information on the original log data set, including a detailed description of scenarios and attacks, can be found in [2].

The alert data set contains two files for each of the eight scenarios, and a file for their labels:

  • <scenario>_aminer.json contains alerts from AMiner IDS
  • <scenario>_wazuh.json contains alerts from Wazuh IDS and Suricata IDS
  • labels.csv contains the start and end times of attack phases in each scenario

Beside false positive alerts, the alerts in the AIT-ADS correspond to the following attacks:

  • Scans (nmap, WPScan, dirb)
  • Webshell upload (CVE-2020-24186)
  • Password cracking (John the Ripper)
  • Privilege escalation
  • Remote command execution
  • Data exfiltration (DNSteal) and stopped service

The total number of alerts involved in the data set is 2,655,821, of which 2,293,628 origin from Wazuh, 306,635 origin from Suricata, and 55,558 origin from AMiner. The numbers of alerts in each scenario are as follows. fox: 473,104; harrison: 593,948; russellmitchell: 45,544; santos: 130,779; shaw: 70,782; wardbeck: 91,257; wheeler: 616,161; wilson: 634,246.

Acknowledgements: Partially funded by the European Defence Fund (EDF) project AInception (101103385) and the FFG project PRESENT (FO999899544).

If you use the AIT-ADS, please cite the following publications:

[1] Landauer, M., Skopik, F., Wurzenberger, M.: Introducing a New Alert Data Set for Multi-Step Attack Analysis. arXiv:2308.12627 [PDF]

[2] Landauer M., Skopik F., Frank M., Hotwagner W., Wurzenberger M., Rauber A. (2023): Maintainable Log Datasets for Evaluation of Intrusion Detection Systems. IEEE Transactions on Dependable and Secure Computing, vol. 20, no. 4, pp. 3466-3482. [PDF]

Notes

Landauer, M., Skopik, F., Wurzenberger, M.: Introducing a New Alert Data Set for Multi-Step Attack Analysis. arXiv:2308.12627

Files

ait_ads.zip

Files (96.2 MB)

Name Size Download all
md5:43db6b1f0996e0024befd617706c50e9
96.2 MB Preview Download
md5:60ff33796c77fd2136c4d1a4bc841bd9
3.7 kB Preview Download