A COMPARATIVE STUDY OF INFORMATION RETRIEVAL MODELS: BM25 VERSUS HYBRID RETRIEVAL ON THE CRANFIELD COLLECTION

Aashish Dhakal; Nabin Chaulagain; Shashank Shree Neupane

doi:10.5281/zenodo.18221720

Published January 12, 2026 | Version v1

Journal article Open

A COMPARATIVE STUDY OF INFORMATION RETRIEVAL MODELS: BM25 VERSUS HYBRID RETRIEVAL ON THE CRANFIELD COLLECTION

Information retrieval systems face the fundamental challenge of balancing retrieval effectiveness with computational efficiency. Traditional lexical modelslike BM25 provide fast retrieval but may misssemantic relationships,
while modern hybrid approaches combining lexical and semantic methods promise improved effectiveness at the
cost of computational complexity. This study compares the performance of BM25 and hybrid retrieval models
on the Cranfield collection, evaluating both retrieval effectiveness and computational efficiency. We implemented
two retrieval systems: (1) BM25 with pseudo-relevance feedback and tuned parameters (k1 = 1.6, b = 0.4), and
(2) a hybrid model combining BM25, dense retrieval using SPECTER embeddings, Reciprocal Rank Fusion, and
cross-encoder reranking. Both systems were evaluated on 225 Cranfield queries using standard IR metrics:
Precision@10, Recall@20, Mean Average Precision (MAP), and NDCG@20. The hybrid model demonstrated
superior retrieval effectiveness with 41% improvement in Precision@10 (1.07% vs 0.76%), 67% improvement
in Recall@20 (1.54% vs 0.92%), and consistent gains across all metrics. However, BM25 showed dramatically
superior efficiency with 800x faster query processing (15.3ms vs 12.4s average latency). While hybrid retrieval
models achieve better effectiveness, the computational cost may limit their applicability to real-time scenarios.
The Cranfield collection’s inherent difficulty (1960s terminology, sparse relevance judgments) constrains absolute
performance regardless of model sophistication.

Files

JAN19.pdf

Files (336.3 kB)

Name	Size	Download all
JAN19.pdf md5:b9fee9a986f5a85c251c1ac55f39bfc4	336.3 kB	Preview Download

Additional details

Repository URL: https://ijetrm.com/issues/files/Jan-2026-12-1768220730-JAN19.pdf

	All versions	This version
Views	233	233
Downloads	117	117
Data volume	43.4 MB	43.4 MB

A COMPARATIVE STUDY OF INFORMATION RETRIEVAL MODELS: BM25 VERSUS HYBRID RETRIEVAL ON THE CRANFIELD COLLECTION

Authors/Creators

Description

Files

JAN19.pdf

Files (336.3 kB)

Additional details

Software