iSearch++: An Augmented State-of-the-Art Information Retrieval Test Collection for Integrated Academic Search
Authors/Creators
Description
The iSearch test collection remains a unique resource for evaluating information access systems such as academic search engines. Built following the Cranfield evaluation paradigm, it combines arXiv full texts and metadata with detailed descriptions of users’ information needs across different expertise levels. Although the collection is now over 15 years old and relatively small by modern standards (~160,000 documents), its structured relevance assessments make it an ideal foundation for evaluating contemporary systems.
The iSearch++ project aims to modernize this dataset by improving full-text extraction (e.g., table extraction), re-evaluating relevance using LLM-as-a-Judge methods, integrating the collection into the ir_datasets framework, and aligning it with FAIR principles. Within the NFDIxCS context, iSearch++ demonstrates how legacy research datasets can be updated to meet current technical and accessibility standards while preserving their original research value.
Software is publicly available and can be found at the reference landing page on GitHub.
This work is funded by the German Research Foundation (DFG) as part of the NFDIxCS consortium (Grant number: 501930651).
Files
Additional details
Dates
- Other
-
2026-05-12