Webis Rank-DistiLLM
Description
This dataset contains run files for training queries from MS MARCO passage re-ranked by RankGPT-4 Turbo or RankZephyr. These run files can be used to distill smaller and more efficient models while upholding effectiveness.
The files __colbert__msmarco-passage-train-judged.run
and __bm25__msmarco-passage-train-judged.run
contain the top 500 passages for all queries that have at least one relevance judgement in the MS MARCO training query set retrieved by ColBERTv2 and BM25 respectively.
All other files are sub-sampled from these run files and re-ranked by either RankGPT-4 Turbo or RankZephyr. A file's name reveals which LLM was used for re-ranking, which first-stage retrieval model was used, how many queries were re-ranked, and to which depth the rankings were sampled. For example, the file __rankzephyr-colbert-10000-sampled-100__msmarco-passage-train-judged.run
contains the top 100 passages retrieved by ColBERTv2 for 10,000 queries.
Files
Files
(23.4 GB)
Name | Size | Download all |
---|---|---|
md5:835372b2ab4d20acf10addeae526c559
|
13.3 GB | Download |
md5:6ed152027f7270f32fcbfaaa6def951e
|
9.5 GB | Download |
md5:350494570d6e21d46999974c61a8cf72
|
8.2 MB | Download |
md5:32f00d2052dd1b612866216899370ae3
|
769.2 kB | Download |
md5:1e5be4ee19aba9ef209ad6fa98a9e47b
|
1.6 MB | Download |
md5:d59e9d51a604ab56e6ce080b5cbe8c24
|
4.1 MB | Download |
md5:05e3137ea3526671e1565cc90f9a2c8a
|
28.7 MB | Download |
md5:11e4de19c244220fc493b8a050f075ee
|
2.9 MB | Download |
md5:49f8dbf2c1ee7a2ca1fe517eda528af6
|
28.7 MB | Download |
md5:619bc815bd133bdca44d6331b241d39a
|
2.8 MB | Download |
md5:372ab599b07adfbceef44f2741b0eaa0
|
5.7 MB | Download |
md5:c37b78874d4893a00566ab40aa453c56
|
14.4 MB | Download |
md5:d18ed2c4b3f14fd77c03f7d7a8bfafbf
|
5.8 MB | Download |
md5:4e6a69d5faabd63d5d694d8db2b55b0d
|
14.4 MB | Download |
md5:1f069d0daa9842a54a858cc660149e1a
|
520.9 MB | Download |