Catching the Rare: Ensemble and Linear Models for Imbalanced Network Intrusion Detection

Hardik Tiwari

doi:10.5281/zenodo.18408605

Published January 29, 2026 | Version v2

Preprint Open

Catching the Rare: Ensemble and Linear Models for Imbalanced Network Intrusion Detection

Hardik Tiwari

Contributors

Researcher:

Hardik Tiwari

This record contains the materials for the study “Catching the Rare: Ensemble and Linear Models for Imbalanced Network Intrusion Detection.” The work investigates supervised machine learning approaches for binary network intrusion detection under severe class imbalance, with a focus on identifying rare attack instances.

The study is motivated by attack behaviors commonly analyzed in honeypot environments but does not involve live honeypot deployment. Experiments are conducted using the publicly available CICIDS 2017 dataset, generated in a controlled environment that simulates realistic benign and malicious network traffic.

Three supervised learning models - Logistic Regression, RandomForest, and XGBoost, are evaluated to compare linear and ensemble-based approaches. Model performance is assessed using imbalance-aware metrics, including precision–recall curves, ROC analysis, and balanced accuracy, rather than accuracy alone. Feature importance and model coefficients are analyzed to provide interpretable insights into network flow characteristics associated with malicious activity.

The study provides a transparent and reproducible baseline for intrusion detection research inspired by honeypot traffic analysis. While results are based on simulated network data, the methodology and findings may inform future work involving live network traffic, deployed honeypots, or adaptive learning approaches.

Files