Published May 12, 2026 | Version v1
Dataset Restricted

iSearch++: An Augmented State-of-the-Art Information Retrieval Test Collection for Integrated Academic Search

  • 1. ROR icon TH Köln - University of Applied Sciences

Description

The iSearch test collection remains a unique resource for evaluating information access systems such as academic search engines. Built following the Cranfield evaluation paradigm, it combines arXiv full texts and metadata with detailed descriptions of users’ information needs across different expertise levels. Although the collection is now over 15 years old and relatively small by modern standards (~160,000 documents), its structured relevance assessments make it an ideal foundation for evaluating contemporary systems.

The iSearch++ project aims to modernize this dataset by improving full-text extraction (e.g., table extraction), re-evaluating relevance using LLM-as-a-Judge methods, integrating the collection into the ir_datasets framework, and aligning it with FAIR principles. Within the NFDIxCS context, iSearch++ demonstrates how legacy research datasets can be updated to meet current technical and accessibility standards while preserving their original research value.

Software is publicly available and can be found at the reference landing page on GitHub.

This work is funded by the German Research Foundation (DFG) as part of the NFDIxCS consortium (Grant number: 501930651).

Files

Restricted

The record is publicly accessible, but files are restricted. <a href="https://zenodo.org/account/settings/login?next=https://zenodo.org/records/20143498">Log in</a> to check if you have access.

Additional details

Dates

Other
2026-05-12