Published June 18, 2025 | Version v2
Dataset Open

AOL4FOLTR

  • 1. ROR icon Delft University of Technology
  • 2. ROR icon University of Amsterdam

Description

AOL4FOLTR is the first learning-to-rank (LTR) dataset designed specifically for evaluating federated online learning-to-rank (FOLTR) algorithms.

Including user identifiers and timestamps, this dataset allows for the simulation of real user behavior with heterogeneous data and in asynchronous federated learning settings.

The dataset consists of two files

  • letor.txt.gz (55G uncompressed)
  • metadata.csv

letor.txt contains the query-document pairs for all query logs in standard LETOR format. Each query-document pair holds a binary label derived from user clicks, and is further represented by a 103-dimensional vector. We document the features in our code repository.

The query logs are cross-referenced (by qid) in metadata.csv, where contextual information is provided. This includes the user, timestamp, raw query, the target document ID, and a list of 20 candidate documents.

The document IDs and user IDs directly map to the AOL-IA dataset; the query IDs do not. For access to the raw document contents, please refer to this dataset.

Files

metadata.csv

Files (10.0 GB)

Name Size Download all
md5:75362ad22b58cdf2b1252ea325f6beb1
9.0 GB Download
md5:bbd1cdfeec45120b9d8b0423dc5bd003
1.0 GB Preview Download

Additional details

Additional titles

Subtitle
A Large-Scale Web Search Dataset for Federated Online Learning to Rank

Funding

Dutch Research Council
BLOCK.2019.004

Software

Repository URL
https://github.com/mg98/aol4foltr
Programming language
Python