There is a newer version of the record available.

Published July 25, 2024 | Version v3
Dataset Open

Webis Rank-DistiLLM

  • 1. ROR icon Friedrich Schiller University Jena

Description

This dataset contains run files for training queries from MS MARCO passage re-ranked by RankGPT-4 Turbo or RankZephyr. These run files can be used to distill smaller and more efficient models while upholding effectiveness.

The files __colbert__msmarco-passage-train-judged.run and __bm25__msmarco-passage-train-judged.run contain the top 500 passages for all queries that have at least one relevance judgement in the MS MARCO training query set retrieved by ColBERTv2 and BM25 respectively.

All other files are sub-sampled from these run files and re-ranked by either RankGPT-4 Turbo or RankZephyr. A file's name reveals which LLM was used for re-ranking, which first-stage retrieval model was used, how many queries were re-ranked, and to which depth the rankings were sampled. For example, the file __rankzephyr-colbert-10000-sampled-100__msmarco-passage-train-judged.run contains the top 100 passages retrieved by ColBERTv2 for 10,000 queries.

Files

Files (23.4 GB)

Name Size Download all
md5:835372b2ab4d20acf10addeae526c559
13.3 GB Download
md5:6ed152027f7270f32fcbfaaa6def951e
9.5 GB Download
md5:350494570d6e21d46999974c61a8cf72
8.2 MB Download
md5:32f00d2052dd1b612866216899370ae3
769.2 kB Download
md5:1e5be4ee19aba9ef209ad6fa98a9e47b
1.6 MB Download
md5:d59e9d51a604ab56e6ce080b5cbe8c24
4.1 MB Download
md5:05e3137ea3526671e1565cc90f9a2c8a
28.7 MB Download
md5:11e4de19c244220fc493b8a050f075ee
2.9 MB Download
md5:49f8dbf2c1ee7a2ca1fe517eda528af6
28.7 MB Download
md5:619bc815bd133bdca44d6331b241d39a
2.8 MB Download
md5:372ab599b07adfbceef44f2741b0eaa0
5.7 MB Download
md5:c37b78874d4893a00566ab40aa453c56
14.4 MB Download
md5:d18ed2c4b3f14fd77c03f7d7a8bfafbf
5.8 MB Download
md5:4e6a69d5faabd63d5d694d8db2b55b0d
14.4 MB Download
md5:1f069d0daa9842a54a858cc660149e1a
520.9 MB Download