Published August 16, 2022 | Version v1
Conference paper Open

Towards Continuous Enrichment of Cyber Threat Intelligence: A Study on a Honeypot Dataset


Cyber Threat Intelligence helps organisations in their fight against cyber threats to strategically design their defences and support decision making by continuously providing information regarding the cyber threat landscape. In this context, honeypots are a widespread solution for gathering intelligence about threat actors. However, honeypots do not inherently provide information about the origin of threat groups, their resources, capabilities, and potential impact. Thus, we propose an approach that classifies threats, as highly or less abusive, based on their behaviour characteristics using four ensemble machine learning algorithms applied on security incidents identified in a rule-based manner on a deployed honeypot. After prepossessing and hyper-tuning of the parameters, the four examined models, Random Forest Classifier (RFC), Adaptive Boosting Classifier (AdaBoost), Light Gradient Boosting Machine (LGBM) and Extreme Gradient Boosting (XGBoost), achieve good results, with RFC and LGBM achieving the best recall (84%, 83%) and LGBM and XGB the best AUC (91%, 90%).


This is the accepted version of the paper. The final version can be found on


Towards Continuous Enrichment of Cyber Threat Intelligence A Study on a Honeypot Dataset_accepted_v02.pdf

Additional details


FORESIGHT – Advanced cyber-security simulation platform for preparedness training in Aviation, Naval and Power-grid environments 833673
European Commission
ECHO – European network of Cybersecurity centres and competence Hub for innovation and Operations 830943
European Commission