Published February 5, 2025 | Version v1
Dataset Open

Gotham Dataset 2025: A Reproducible Large-Scale IoT Network Dataset for Intrusion Detection and Security Research

  • 1. ROR icon Cardiff University
  • 2. Toshiba Europe Limited, Bristol Research & Innovation Laboratory

Description

đŸ“Œ Overview

This dataset provides a realistic, large-scale benchmark for developing and evaluating decentralised Intrusion Detection Systems (IDS) and Federated Learning (FL) algorithms in Internet of Things (IoT) environments.

Generated using the open-source Gotham testbed, the dataset captures network traffic at the interface level of 78 heterogeneous IoT devices within a virtualised smart city. Unlike traditional datasets that aggregate traffic centrally, this dataset preserves the granular, non-Independent and Identically Distributed (non-IID) nature of edge data, making it uniquely suitable for validating distributed AI privacy-preserving security mechanisms.

đŸ“‚ Dataset Contents

The dataset is organised by device ID. Each file contains packet features extracted from the network interface of a specific IoT node.

Attack Types Included:

  1. Mirai Botnet: Full kill-chain simulation (Scanning, Brute Force, Infection, Volumetric Flooding).

  2. Merlin C2: HTTP/1.1, HTTP/2, and HTTP/3 (QUIC) command and control traffic.

  3. Network Reconnaissance: Masscan (TCP) and Nmap (UDP) sweeping at varying rates.

  4. Amplification: CoAP reflection attacks generate DDoS traffic.

  5. Denial of Service: UDP Floods, TCP SYN/ACK Floods, DNS Water Torture.

đŸ›  Usage

This folder contains CSV files derived from the Raw Data. Each CSV file includes feature vectors extracted from network packets, converting unstructured packet data into a structured format ready for machine learning or statistical analysis.

  • For Centralised Learning: Concatenate all CSV files.

  • For Federated Learning: Treat each CSV file (or cluster of files) as a local client's private dataset to test aggregation algorithms (e.g., FedAvg) under realistic skew.

Files

GothamDataset2025.zip

Files (23.8 GB)

Name Size Download all
md5:7ca78c0517ccb3d2854e823678e0f206
23.8 GB Preview Download

Additional details

Software

Repository URL
https://github.com/othmbela/gotham-network-packet-labeller
Programming language
Python
Development Status
Active