SmartHome-IoT-Federated-Anomaly-Dataset
Authors/Creators
- 1. Faculty of Engineering, Department of Computer Engineering, Kocaeli University, Kocaeli, Turkey
- 2. Department of Mathematics, College of Education, Salahaddin University, Erbil, Iraq
- 3. Department of Digital Forensics Specialization, Institution of Forensic Medicine, Istanbul, Turkey
Description
This dataset presents a real-world heterogeneous smart-home IoT dataset designed to support anomaly detection research under both centralized machine learning (ML) and federated learning (FL) paradigms.
The dataset was collected from a residential deployment consisting of four Raspberry Pi devices (Pi-4 to Pi-7), each equipped with different sensor configurations, including temperature, humidity, motion (PIR), accelerometer (ADXL345), and gas sensors (MQ). This heterogeneous setup results in diverse feature spaces and naturally non-independent and non-identically distributed (non-IID) data across devices.
The dataset contains over 7 million multivariate time-series records with a sampling interval of approximately 2–2.5 seconds. It captures realistic IoT characteristics, including temporal irregularities, missing values, sensor noise, and highly imbalanced anomaly distributions (~0.6% anomalies).
Anomalies were introduced using a controlled marker-based injection framework, simulating real-world conditions such as sensor faults, environmental changes, device stress, and network disturbances. In addition, retroactive anomaly scenarios are provided for selected time periods (e.g., holiday intervals) to support reproducibility and controlled experimentation.
A comprehensive integrity-aware data curation pipeline was applied, including timestamp normalization, detection and correction of malformed entries, duplicate removal, anomaly label validation, and audit logging. The final curated dataset ensures high data quality and full traceability.
The dataset is released in multiple formats to support different research settings:
- A centralized ML-ready dataset
- Device-level partitions representing federated learning clients (Pi-4 to Pi-7)
- Experimental subsets for lightweight benchmarking
Key characteristics:
- Real-world IoT deployment
- Device-level heterogeneity
- Non-IID data distribution
- Multivariate time-series structure
- Highly imbalanced anomaly classes
- Fully reproducible preprocessing pipeline
This dataset provides a realistic benchmark for evaluating anomaly detection methods in distributed IoT environments, particularly for federated learning under practical constraints.
Files
SmartHome-IoT-Federated-Anomaly-Dataset.zip
Files
(275.9 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:177a0a79d63ef06c53621a367e8ccaeb
|
275.9 MB | Preview Download |
Additional details
Additional titles
- Alternative title (English)
- A Real-World Heterogeneous Smart-Home IoT Dataset for Federated Anomaly Detection
- Alternative title (English)
- Smart-Home IoT Dataset for Anomaly Detection and Machine Learning
- Alternative title
- Multivariate IoT Time-Series Dataset for Anomaly Detection in Smart Homes
- Alternative title
- Real-World Smart-Home IoT Dataset with Heterogeneous Sensor Data for Machine Learning
- Alternative title
- Smart-Home IoT Dataset for Machine Learning and Federated Anomaly Detection