Published January 29, 2026 | Version v2
Preprint Open

Catching the Rare: Ensemble and Linear Models for Imbalanced Network Intrusion Detection

Authors/Creators

Contributors

Researcher:

Description

This record contains the materials for the study “Catching the Rare: Ensemble and Linear Models for Imbalanced Network Intrusion Detection.” The work investigates supervised machine learning approaches for binary network intrusion detection under severe class imbalance, with a focus on identifying rare attack instances.

The study is motivated by attack behaviors commonly analyzed in honeypot environments but does not involve live honeypot deployment. Experiments are conducted using the publicly available CICIDS 2017 dataset, generated in a controlled environment that simulates realistic benign and malicious network traffic.

Three supervised learning models - Logistic Regression, RandomForest, and XGBoost, are evaluated to compare linear and ensemble-based approaches. Model performance is assessed using imbalance-aware metrics, including precision–recall curves, ROC analysis, and balanced accuracy, rather than accuracy alone. Feature importance and model coefficients are analyzed to provide interpretable insights into network flow characteristics associated with malicious activity.

The study provides a transparent and reproducible baseline for intrusion detection research inspired by honeypot traffic analysis. While results are based on simulated network data, the methodology and findings may inform future work involving live network traffic, deployed honeypots, or adaptive learning approaches.

Files

Catching the Rare Ensemble and Linear Models for Imbalanced Network Intrusion Detection.pdf

Additional details

Dates

Created
2025-10-20

Software

Repository URL
https://github.com/harddikk/Catching-the-Rare
Programming language
Python , Jupyter Notebook
Development Status
Active