Gotham Dataset 2025: A Reproducible Large-Scale IoT Network Dataset for Intrusion Detection and Security Research

Belarbi, Othmane; Spyridopoulos, Theodoros; Anthi, Eirini; Rana, Omer; Carnelli, Pietro; Khan, Aftab

doi:10.5281/zenodo.14502760

Published February 5, 2025 | Version v1

Dataset Open

Gotham Dataset 2025: A Reproducible Large-Scale IoT Network Dataset for Intrusion Detection and Security Research

1. Cardiff University
2. Toshiba Europe Limited, Bristol Research & Innovation Laboratory

📌 Overview

This dataset provides a realistic, large-scale benchmark for developing and evaluating decentralised Intrusion Detection Systems (IDS) and Federated Learning (FL) algorithms in Internet of Things (IoT) environments.

Generated using the open-source Gotham testbed, the dataset captures network traffic at the interface level of 78 heterogeneous IoT devices within a virtualised smart city. Unlike traditional datasets that aggregate traffic centrally, this dataset preserves the granular, non-Independent and Identically Distributed (non-IID) nature of edge data, making it uniquely suitable for validating distributed AI privacy-preserving security mechanisms.

📂 Dataset Contents

The dataset is organised by device ID. Each file contains packet features extracted from the network interface of a specific IoT node.

Attack Types Included:

Mirai Botnet: Full kill-chain simulation (Scanning, Brute Force, Infection, Volumetric Flooding).
Merlin C2: HTTP/1.1, HTTP/2, and HTTP/3 (QUIC) command and control traffic.
Network Reconnaissance: Masscan (TCP) and Nmap (UDP) sweeping at varying rates.
Amplification: CoAP reflection attacks generate DDoS traffic.
Denial of Service: UDP Floods, TCP SYN/ACK Floods, DNS Water Torture.

🛠 Usage

This folder contains CSV files derived from the Raw Data. Each CSV file includes feature vectors extracted from network packets, converting unstructured packet data into a structured format ready for machine learning or statistical analysis.

For Centralised Learning: Concatenate all CSV files.
For Federated Learning: Treat each CSV file (or cluster of files) as a local client's private dataset to test aggregation algorithms (e.g., FedAvg) under realistic skew.

Files

GothamDataset2025.zip

Files (23.8 GB)

Name	Size	Download all
GothamDataset2025.zip md5:7ca78c0517ccb3d2854e823678e0f206	23.8 GB	Preview Download

Additional details

Repository URL: https://github.com/othmbela/gotham-network-packet-labeller
Programming language: Python
Development Status: Active

	All versions	This version
Views	3,528	3,528
Downloads	1,581	1,581
Data volume	88.1 TB	88.1 TB

Gotham Dataset 2025: A Reproducible Large-Scale IoT Network Dataset for Intrusion Detection and Security Research

Authors/Creators

Description

📌 Overview

📂 Dataset Contents

🛠 Usage

Files

GothamDataset2025.zip

Files (23.8 GB)

Additional details

Software