Web robot detection - Server logs

Lagopoulos, Athanasios; Tsoumakas, Grigorios

doi:10.5281/zenodo.3477932

Published October 9, 2019 | Version v1

Dataset Open

Web robot detection - Server logs

1. Aristotle University of Thessaloniki

This dataset contains server logs from the search engine of the library and information center of the Aristotle University of Thessaloniki in Greece (http://search.lib.auth.gr/). The search engine enables users to check the availability of books and other written works, and search for digitized material and scientific publications. The server logs obtained span an entire month, from March 1st to March 31 2018 and consist of 4,091,155 requests with an average of 131,973 requests per day and a standard deviation of 36,996.7 requests. In total, there are requests from 27,061 unique IP addresses and 3,441 unique user-agent strings. The server logs are in JSON format and they are anonymized by masking the last 6 digits of the IP address and by hashing the last part of the URLs requested (after last /). The dataset also contains the processed form of the server logs as a labelled dataset of log entries grouped into sessions along with their extracted features (simple semantic features). We make this dataset publicly available, the first one in this domain, in order to provide a common ground for testing web robot detection methods, as well as other methods that analyze server logs.

Files

public_v2.json

Files (3.2 GB)

Name	Size
public_v2.json md5:2f126f1f33a8cb7851c2863094998788	3.2 GB	Preview Download
semantic_features.csv md5:8c2c2daef9aeb5c21f153b55caedc005	4.3 MB	Preview Download
simple_features.csv md5:08c9b2efeddd5c15e1e4f66d127408f8	15.9 MB	Preview Download

Additional details

Is compiled by: Journal article: 10.1007/s10489-020-01754-9 (DOI)

Lagopoulos, A., & Tsoumakas, G. (2020). Content-aware web robot detection. Applied Intelligence, 50(11), 4017-4028.

	All versions	This version
Views	2,651	2,643
Downloads	1,721	1,720
Data volume	3.0 TB	3.0 TB

public_v2.json

Files (3.2 GB)

Related works

References

Web robot detection - Server logs

Authors/Creators

Description

Files

public_v2.json

Files (3.2 GB)

Additional details

Related works

References