A Device-Level IoT Network Traffic Dataset with Distributed Capture and Non-IID Characteristics
Authors/Creators
Description
Overview
This dataset provides a large-scale benchmark for developing and evaluating Intrusion Detection Systems (IDS) in Internet of Things (IoT) environments. It is designed to capture the heterogeneity and distributed nature of IoT network traffic, which are often not represented in existing datasets.
The dataset was generated using the open-source Gotham testbed and consists of network traffic collected at the interface level of 78 heterogeneous IoT devices within a virtualised smart city environment. Unlike traditional datasets based on centralised traffic aggregation, this dataset preserves device-level traffic traces and captures naturally non-Independent and Identically Distributed (non-IID) data across devices, making it uniquely suitable for validating distributed AI privacy-preserving security mechanisms.
Ground-truth labels are assigned using a deterministic process based on testbed orchestration logs, ensuring precise alignment between network traffic and attack events.
Dataset Contents
The dataset is organised by device identifier. Each file contains packet features extracted from the network interface of a specific IoT node.
Data Formats:
- PCAP: Raw network traffic captures
- CSV: Processed packet-level feature representations (UTF-8 encoded, comma-separated)
Each CSV file corresponds directly to a PCAP file, while metadata files provide contextual information (timestamps, attacker IPs, attack types) used for deterministic labelling.
Attack Types Included:
-
Mirai Botnet: Full kill-chain simulation (Scanning, Brute Force, Infection, Volumetric Flooding).
-
Merlin C2: HTTP/1.1, HTTP/2, and HTTP/3 (QUIC) command and control traffic.
-
Network Reconnaissance: Masscan (TCP) and Nmap (UDP) sweeping at varying rates.
-
Amplification: CoAP reflection attacks generate DDoS traffic.
-
Denial of Service: UDP Floods, TCP SYN/ACK Floods, DNS Water Torture.
Usage
The dataset is provided in CSV format, where each file contains packet-level features extracted from raw network traffic. Each CSV file corresponds to a single IoT device, enabling device-level analysis.
For centralised analysis, CSV files can be concatenated to form a unified dataset. For distributed or decentralised settings, each file (or group of files) can be treated as an independent data source, reflecting realistic variations in traffic distributions across devices.
Users interested in custom feature extraction or protocol-level analysis may refer to the corresponding raw PCAP files.
Files
GothamDataset2025.zip
Files
(23.8 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:7ca78c0517ccb3d2854e823678e0f206
|
23.8 GB | Preview Download |
Additional details
Software
- Repository URL
- https://github.com/othmbela/gotham-network-packet-labeller
- Programming language
- Python
- Development Status
- Active