Published August 26, 2019 | Version v1
Conference paper Open

Towards a framework for detecting advanced Web bots

  • 1. Information Technologies Institute, CERTH, Thessaloniki, Greece. Department of Computing and Informatics, Bournemouth University, Bournemouth, United Kingdom.
  • 2. Department of Computing and Informatics, Bournemouth University, Bournemouth, United Kingdom.
  • 3. Information Technologies Institute, CERTH, Thessaloniki, Greece.


Automated programs (bots) are responsible for a large percentage of website traffic. These bots can either be used for benign purposes, such as Web indexing, Website monitoring (validation of hyperlinks and HTML code), feed fetching Web content and data extraction for commercial use or for malicious ones, including, but not limited to, content scraping, vulnerability scanning, account takeover, distributed denial of service attacks, marketing fraud, carding and spam. To ensure their security, Web servers try to identify bot sessions and apply special rules to them, such as throttling their requests or delivering different content. The methods currently used for the identification of bots are based either purely on rule-based bot detection techniques or a combination of rule-based and machine learning techniques. While current research has developed highly adequate methods for Web bot detection, these methods' adequacy when faced with Web bots that try to remain undetected hasn't been studied. For this reason, we created and evaluated a Web bot detection framework on its ability to detect conspicuous bots separately from its ability to detect advanced Web bots. We assessed the proposed framework performance using real HTTP traffic from a public Web server. Our experimental results show that the proposed framework has significant ability to detect Web bots that do not try to hide their bot identity using HTTP Web logs (balanced accuracy in a false-positive intolerant server > 95\%). However, detecting advanced Web bots that present a browser fingerprint and may present a humanlike behaviour as well is considerably more difficult.



Files (3.9 MB)

Additional details


IDEAL-CITIES – Intelligence-Driven Urban Internet-of-Things Ecosystems for Trustworthy and Circular Smart Cities 778229
European Commission
TENSOR – Retrieval and Analysis of Heterogeneous Online Content for Terrorist Activity Recognition 700024
European Commission