Webis Rank-DistiLLM

Schlatt, Ferdinand

doi:10.5281/zenodo.12528410

Published July 25, 2024 | Version v3

Dataset Open

Webis Rank-DistiLLM

Schlatt, Ferdinand (Researcher)¹

1. Friedrich Schiller University Jena

This dataset contains run files for training queries from MS MARCO passage re-ranked by RankGPT-4 Turbo or RankZephyr. These run files can be used to distill smaller and more efficient models while upholding effectiveness.

The files __colbert__msmarco-passage-train-judged.run and __bm25__msmarco-passage-train-judged.run contain the top 500 passages for all queries that have at least one relevance judgement in the MS MARCO training query set retrieved by ColBERTv2 and BM25 respectively.

All other files are sub-sampled from these run files and re-ranked by either RankGPT-4 Turbo or RankZephyr. A file's name reveals which LLM was used for re-ranking, which first-stage retrieval model was used, how many queries were re-ranked, and to which depth the rankings were sampled. For example, the file __rankzephyr-colbert-10000-sampled-100__msmarco-passage-train-judged.run contains the top 100 passages retrieved by ColBERTv2 for 10,000 queries.

Files

Files (23.4 GB)

Name	Size	Download all
__bm25__msmarco-passage-train-judged.run md5:835372b2ab4d20acf10addeae526c559	13.3 GB	Download
__colbert__msmarco-passage-train-judged.run md5:6ed152027f7270f32fcbfaaa6def951e	9.5 GB	Download
__rankgpt-colbert-2000-sampled-100__msmarco-passage-train-judged.run md5:350494570d6e21d46999974c61a8cf72	8.2 MB	Download
__rankgpt-colbert-2000-sampled-10__msmarco-passage-train-judged.run md5:32f00d2052dd1b612866216899370ae3	769.2 kB	Download
__rankgpt-colbert-2000-sampled-20__msmarco-passage-train-judged.run md5:1e5be4ee19aba9ef209ad6fa98a9e47b	1.6 MB	Download
__rankgpt-colbert-2000-sampled-50__msmarco-passage-train-judged.run md5:d59e9d51a604ab56e6ce080b5cbe8c24	4.1 MB	Download
__rankzephyr-bm25-10000-sampled-100__msmarco-passage-train-judged.run md5:05e3137ea3526671e1565cc90f9a2c8a	28.7 MB	Download
__rankzephyr-colbert-1000-sampled-100__msmarco-passage-train-judged.run md5:11e4de19c244220fc493b8a050f075ee	2.9 MB	Download
__rankzephyr-colbert-10000-sampled-100__msmarco-passage-train-judged.run md5:49f8dbf2c1ee7a2ca1fe517eda528af6	28.7 MB	Download
__rankzephyr-colbert-10000-sampled-10__msmarco-passage-train-judged.run md5:619bc815bd133bdca44d6331b241d39a	2.8 MB	Download
__rankzephyr-colbert-10000-sampled-20__msmarco-passage-train-judged.run md5:372ab599b07adfbceef44f2741b0eaa0	5.7 MB	Download
__rankzephyr-colbert-10000-sampled-50__msmarco-passage-train-judged.run md5:c37b78874d4893a00566ab40aa453c56	14.4 MB	Download
__rankzephyr-colbert-2000-sampled-100__msmarco-passage-train-judged.run md5:d18ed2c4b3f14fd77c03f7d7a8bfafbf	5.8 MB	Download
__rankzephyr-colbert-5000-sampled-100__msmarco-passage-train-judged.run md5:4e6a69d5faabd63d5d694d8db2b55b0d	14.4 MB	Download
__set-encoder-colbert__msmarco-passage-train-judged.run.gz md5:1f069d0daa9842a54a858cc660149e1a	520.9 MB	Download

	All versions	This version
Views	743	154
Downloads	3,314	1,594
Data volume	7.8 TB	3.1 TB

Webis Rank-DistiLLM

Creators

Description

Files

Files (23.4 GB)