Published April 7, 2026 | Version 1.0
Dataset Open

CICIoMT2024 Cleaned Dataset and EDA Outputs

  • 1. Bina Nusantara University
  • 2. ROR icon Harapan Bangsa Institute of Technology
  • 3. ROR icon Bandung Institute of Technology
  • 4. ROR icon University of Indonesia

Description

Description

This repository contains the processed datasets and exploratory data analysis (EDA) outputs generated as part of the research paper <TBC>. The artefacts in this repository serve as the empirical foundation for the <TBC>.

The source dataset, CICIoMT2024, was originally published by Dadkhah et al. (2024) at the Canadian Institute for Cybersecurity, University of New Brunswick, and is available at https://www.unb.ca/cic/datasets/iomt-dataset-2024.html. The raw dataset comprises 8,775,013 flow records collected from 40 IoMT devices across three communication protocols (WiFi, MQTT, and Bluetooth Low Energy), covering 18 labeled attack types across five categories.

The files in this repository are not the raw CICIoMT2024 files. These are the cleaned, structured, and analytically processed derivatives produced through a four-phase EDA pipeline executed in Python 3.11. All outputs were produced using Python 3.11 (scikit-learn 1.4, pandas 2.1, NumPy 1.26, matplotlib 3.8, seaborn 0.13, scipy 1.12) in a Jupyter Notebook environment.

Core Processed Data (Essential)

These are the cleaned outputs from cleaning process of the original CICIoMT2024 dataset. These data are the primary reproducibility artifacts:

  • wifimqtt_clean.parquet contains 9,128,910 rows, cleaned & labelled WiFi/MQTT flows
  • bluetooth_clean.parquet contains cleaned & labelled Bluetooth packet-level records
  • cleaning_summary.csv contains row counts, column counts, unique label counts for both subsets

For Framework Development

Phase 1 - Traffic Distribution

  • benign_attack_ratio.csv
  • traffic_by_protocol.csv
  • protocol_contribution.csv

Phase 2 - Attack Category Profiling

  • wifimqtt_subtype_counts.csv
  • wifi_attack_category_profile.csv
  • bluetooth_category_counts.csv

Phase 3 - Feature Correlation & Redundancy

  • wifimqtt_correlation_matrix.csv
  • wifimqtt_high_corr_pairs.csv
  • wifimqtt_feature_tier_assignments.csv
  • bluetooth_correlation_matrix.csv
  • bluetooth_high_corr_pairs.csv
  • bluetooth_feature_tier_assignments.csv

Phase 4 - Class Imbalance

  • wifimqtt_imbalance.csv
  • wifimqtt_entropy_summary.csv
  • bluetooth_imbalance.csv
  • bluetooth_entropy_summary.csv
  • entropy_comparison.csv

Files

Core Processed Data.zip

Files (250.6 MB)

Name Size Download all
md5:ff856492ab3ca66234c3b5968a74ce85
250.6 MB Preview Download
md5:8e4a994ebd83b89e1f43782572fe9874
798 Bytes Preview Download
md5:da42f2ae8f8658d359f339f744614818
1.3 kB Preview Download
md5:75d279a643701ff98fe7f102a7900c74
15.1 kB Preview Download
md5:43b288c5101242a96d428e1676ca7268
2.1 kB Preview Download