CICIoMT2024 Cleaned Dataset and EDA Outputs
Authors/Creators
Description
Description
This repository contains the processed datasets and exploratory data analysis (EDA) outputs generated as part of the research paper <TBC>. The artefacts in this repository serve as the empirical foundation for the <TBC>.
The source dataset, CICIoMT2024, was originally published by Dadkhah et al. (2024) at the Canadian Institute for Cybersecurity, University of New Brunswick, and is available at https://www.unb.ca/cic/datasets/iomt-dataset-2024.html. The raw dataset comprises 8,775,013 flow records collected from 40 IoMT devices across three communication protocols (WiFi, MQTT, and Bluetooth Low Energy), covering 18 labeled attack types across five categories.
The files in this repository are not the raw CICIoMT2024 files. These are the cleaned, structured, and analytically processed derivatives produced through a four-phase EDA pipeline executed in Python 3.11. All outputs were produced using Python 3.11 (scikit-learn 1.4, pandas 2.1, NumPy 1.26, matplotlib 3.8, seaborn 0.13, scipy 1.12) in a Jupyter Notebook environment.
Core Processed Data (Essential)
These are the cleaned outputs from cleaning process of the original CICIoMT2024 dataset. These data are the primary reproducibility artifacts:
- wifimqtt_clean.parquet contains 9,128,910 rows, cleaned & labelled WiFi/MQTT flows
- bluetooth_clean.parquet contains cleaned & labelled Bluetooth packet-level records
- cleaning_summary.csv contains row counts, column counts, unique label counts for both subsets
For Framework Development
Phase 1 - Traffic Distribution
- benign_attack_ratio.csv
- traffic_by_protocol.csv
- protocol_contribution.csv
Phase 2 - Attack Category Profiling
- wifimqtt_subtype_counts.csv
- wifi_attack_category_profile.csv
- bluetooth_category_counts.csv
Phase 3 - Feature Correlation & Redundancy
- wifimqtt_correlation_matrix.csv
- wifimqtt_high_corr_pairs.csv
- wifimqtt_feature_tier_assignments.csv
- bluetooth_correlation_matrix.csv
- bluetooth_high_corr_pairs.csv
- bluetooth_feature_tier_assignments.csv
Phase 4 - Class Imbalance
- wifimqtt_imbalance.csv
- wifimqtt_entropy_summary.csv
- bluetooth_imbalance.csv
- bluetooth_entropy_summary.csv
- entropy_comparison.csv
Files
Core Processed Data.zip
Files
(250.6 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:ff856492ab3ca66234c3b5968a74ce85
|
250.6 MB | Preview Download |
|
md5:8e4a994ebd83b89e1f43782572fe9874
|
798 Bytes | Preview Download |
|
md5:da42f2ae8f8658d359f339f744614818
|
1.3 kB | Preview Download |
|
md5:75d279a643701ff98fe7f102a7900c74
|
15.1 kB | Preview Download |
|
md5:43b288c5101242a96d428e1676ca7268
|
2.1 kB | Preview Download |