1st International Workshop on Open Web Search #wows2024 at ECIR 2024: Document Processors
Description
The First International Workshop on Open Web Search (WOWS) hosted at [ECIR 2024](https://www.ecir2024.org/) aimed to promote and discuss ideas and approaches to open up the web search ecosystem so that small research groups and young startups can leverage the web to foster an open and diverse search market. The workshop had two calls that support collaborative and open web search engines: (1) for scientific contributions, and (2) for open-source implementations. This repository collects the outputs of all submitted document processing components on public datasets for the second call aims to gather open-source prototypes and gain practical experience with collaborative, cooperative evaluation of search engines and their components using the [TIREx Information Retrieval Evaluation Platform](https://www.tira.io/tirex) hosted on [TIRA](https://www.tira.io).
Citations
If you reuse the resources, please ensure to cite TIRA and TIREx and the corresponding datasets, the corresponding bib-entries are:
For TIREx:
@InProceedings{froebe:2023e,
author = {Maik Fr{\"o}be and {Jan Heinrich} Reimer and Sean MacAvaney and Niklas Deckers and Simon Reich and Janek Bevendorff and Benno Stein and Matthias Hagen and Martin Potthast},
booktitle = {46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2023)},
doi = {10.1145/3539618.3591888},
editor = {Hsin{-}Hsi Chen and Wei{-}Jou (Edward) Duh and Hen{-}Hsen Huang and Makoto P. Kato and Josiane Mothe and Barbara Poblete},
ids = {potthast:2023t},
isbn = {9781450394086},
month = jul,
numpages = 11,
pages = {2826--2836},
publisher = {ACM},
site = {Taipei, Taiwan},
title = {{The Information Retrieval Experiment Platform}},
url = {https://dl.acm.org/doi/10.1145/3539618.3591888},
year = 2023
}
for TIRA:
@InProceedings{froebe:2023b,
address = {Berlin Heidelberg New York},
author = {Maik Fr{\"o}be and Matti Wiegmann and Nikolay Kolyada and Bastian Grahm and Theresa Elstner and Frank Loebe and Matthias Hagen and Benno Stein and Martin Potthast},
booktitle = {Advances in Information Retrieval. 45th European Conference on {IR} Research ({ECIR} 2023)},
doi = {10.1007/978-3-031-28241-6_20},
editor = {Jaap Kamps and Lorraine Goeuriot and Fabio Crestani and Maria Maistro and Hideo Joho and Brian Davis and Cathal Gurrin and Udo Kruschwitz and Annalina Caputo},
ids = {potthast:2023h},
month = apr,
pages = {236--241},
publisher = {Springer},
series = {Lecture Notes in Computer Science},
site = {Dublin, Irland},
title = {{Continuous Integration for Reproducible Shared Tasks with TIRA.io}},
url = {https://link.springer.com/chapter/10.1007/978-3-031-28241-6_20},
year = 2023
}
All query processors are described in the corresponding WOWS paper, please cite the papers and underlying approaches accordingly.
Forthermore, please cite the datasets that you use.
Args.me
If you re-use the Args.me indices, please additionally cite:
@InProceedings{bondarenko:2021d,
address = {Berlin Heidelberg New York},
author = {Alexander Bondarenko and Lukas Gienapp and Maik Fr{\"o}be and Meriem Beloucif and Yamen Ajjour and Alexander Panchenko and Chris Biemann and Benno Stein and Henning Wachsmuth and Martin Potthast and Matthias Hagen},
booktitle = {Experimental IR Meets Multilinguality, Multimodality, and Interaction. 12th International Conference of the CLEF Association (CLEF 2021)},
editor = {{K. Sel{\c{c}}uk} Candan and Bogdan Ionescu and Lorraine Goeuriot and Henning M{\"u}ller and Alexis Joly and Maria Maistro and Florina Piroi and Guglielmo Faggioli and Nicola Ferro},
ids = {potthast:2021t},
month = sep,
pages = {450-467},
publisher = {Springer},
series = {Lecture Notes in Computer Science},
site = {Bucharest, Romania},
title = {{Overview of Touch{\'e} 2021: Argument Retrieval}},
volume = 12880,
year = 2021
}@InProceedings{bondarenko:2022f,
address = {Berlin Heidelberg New York},
author = {Alexander Bondarenko and Maik Fr{\"o}be and Johannes Kiesel and Shahbaz Syed and Timon Gurcke and Meriem Beloucif and Alexander Panchenko and Chris Biemann and Benno Stein and Henning Wachsmuth and Martin Potthast and Matthias Hagen},
booktitle = {Experimental IR Meets Multilinguality, Multimodality, and Interaction. 13th International Conference of the CLEF Association (CLEF 2022)},
editor = {Alberto Barr{\'o}n-Cede{\~n}o and Giovanni Da San Martino and Mirko Degli Esposti and Fabrizio Sebastiani and Craig Macdonald and Gabriella Pasi and Allan Hanbury and Martin Potthast and Guglielmo Faggioli and Nicola Ferro},
ids = {potthast:2022j},
month = sep,
numpages = 29,
publisher = {Springer},
series = {Lecture Notes in Computer Science},
site = {Bologna, Italy},
title = {{Overview of Touch{\'e} 2022: Argument Retrieval}},
year = 2022
}
Antique
If you re-use the Antique indices, please additionally cite:
@inproceedings{hashemi:2020,
author = {Helia Hashemi and Mohammad Aliannejadi and Hamed Zamani and W. Bruce Croft},
editor = {Joemon M. Jose and Emine Yilmaz and Jo{\~{a}}o Magalh{\~{a}}es and Pablo Castells and Nicola Ferro and M{\'{a}}rio J. Silva and Fl{\'{a}}vio Martins},
title = {{ANTIQUE:} {A} Non-factoid Question Answering Benchmark},
booktitle = {Advances in Information Retrieval - 42nd European Conference on {IR} Research, {ECIR} 2020, Lisbon, Portugal, April 14-17, 2020, Proceedings, Part {II}},
series = {Lecture Notes in Computer Science},
volume = {12036},
pages = {166--173},
publisher = {Springer},
year = {2020},
}
CORD-19
If you re-use the CORD-19 indices, please additionally cite:
@article{voorhees:2020,
author = {Ellen M. Voorhees and Tasmeer Alam and Steven Bedrick and Dina Demner{-}Fushman and William R. Hersh and Kyle Lo and Kirk Roberts and Ian Soboroff and Lucy Lu Wang},
title = {{TREC-COVID:} constructing a pandemic information retrieval test collection},
journal = {{SIGIR} Forum},
volume = {54},
number = {1},
pages = {1:1--1:12},
year = {2020},
}
@article{wang:2020,
author = {Lucy Lu Wang and Kyle Lo and Yoganand Chandrasekhar and Russell Reas and Jiangjiang Yang and Darrin Eide and Kathryn Funk and Rodney Kinney and Ziyang Liu and William Merrill and Paul Mooney and Dewey A. Murdick and Devvret Rishi and Jerry Sheehan and Zhihong Shen and Brandon Stilson and Alex D. Wade and Kuansan Wang and Chris Wilhelm and Boya Xie and Douglas Raymond and Daniel S. Weld and Oren Etzioni and Sebastian Kohlmeier},
title = {{CORD-19:} The Covid-19 Open Research Dataset},
journal = {CoRR},
volume = {abs/2004.10706},
year = {2020},
eprinttype = {arXiv},
eprint = {2004.10706},
}
Cranfield
If you re-use the Cranfield indices, please additionally cite:
@inproceedings{cleverdon:1967,
title={The {C}ranfield tests on index language devices},
author={Cleverdon, Cyril},
booktitle={{ASLIB} Proceedings},
year={1967},
pages = {173--192},
organization={MCB UP Ltd. (Reprinted in Readings in Information Retrieval, Karen Sparck-Jones and Peter Willett, editors, Morgan Kaufmann, 1997)}
}
@inproceedings{cleverdon:1991,
author = {Cyril W. Cleverdon},
editor = {Abraham Bookstein and Yves Chiaramella and Gerard Salton and Vijay V. Raghavan},
title = {The Significance of the {C}ranfield Tests on Index Languages},
booktitle = {Proceedings of the 14th Annual International {ACM} {SIGIR} Conference on Research and Development in Information Retrieval. Chicago, Illinois, USA, October 13-16, 1991 (Special Issue of the {SIGIR} Forum)},
pages = {3--12},
publisher = {{ACM}},
year = {1991},
}
Medline TREC Genomics
If you re-use the Medline TREC Genomics indices, please additionally cite:
@inproceedings{hersh:2004,
author = {William R. Hersh and Ravi Teja Bhupatiraju and L. Ross and Aaron M. Cohen and Dale Kraemer and Phoebe Johnson},
editor = {Ellen M. Voorhees and Lori P. Buckland},
title = {{TREC} 2004 Genomics Track Overview},
booktitle = {Proceedings of the Thirteenth Text REtrieval Conference, {TREC} 2004, Gaithersburg, Maryland, USA, November 16-19, 2004},
series = {{NIST} Special Publication},
volume = {500-261},
publisher = {National Institute of Standards and Technology {(NIST)}},
year = {2004},
}
@inproceedings{hersh:2005,
author = {William R. Hersh and Aaron M. Cohen and Jianji Yang and Ravi Teja Bhupatiraju and Phoebe M. Roberts and Marti A. Hearst},
editor = {Ellen M. Voorhees and Lori P. Buckland},
title = {{TREC} 2005 Genomics Track Overview},
booktitle = {Proceedings of the Fourteenth Text REtrieval Conference, {TREC} 2005, Gaithersburg, Maryland, USA, November 15-18, 2005},
series = {{NIST} Special Publication},
volume = {500-266},
publisher = {National Institute of Standards and Technology {(NIST)}},
year = {2005},
}
Medline TREC Precision Medicine
If you re-use the Medline TREC Precision Medicine indices, please additionally cite:
@inproceedings{roberts:2017,
author = {Kirk Roberts and Dina Demner{-}Fushman and Ellen M. Voorhees and William R. Hersh and Steven Bedrick and Alexander J. Lazar and Shubham Pant},
editor = {Ellen M. Voorhees and Angela Ellis},
title = {Overview of the {TREC} 2017 Precision Medicine Track},
booktitle = {Proceedings of The Twenty-Sixth Text REtrieval Conference, {TREC} 2017, Gaithersburg, Maryland, USA, November 15-17, 2017},
series = {{NIST} Special Publication},
volume = {500-324},
publisher = {National Institute of Standards and Technology {(NIST)}},
year = {2017},
}
@inproceedings{roberts:2018,
author = {Kirk Roberts and Dina Demner{-}Fushman and Ellen M. Voorhees and William R. Hersh and Steven Bedrick and Alexander J. Lazar},
editor = {Ellen M. Voorhees and Angela Ellis},
title = {Overview of the {TREC} 2018 Precision Medicine Track},
booktitle = {Proceedings of the Twenty-Seventh Text REtrieval Conference, {TREC} 2018, Gaithersburg, Maryland, USA, November 14-16, 2018},
series = {{NIST} Special Publication},
volume = {500-331},
publisher = {National Institute of Standards and Technology {(NIST)}},
year = {2018},
}
MS MARCO (TREC Deep Learning 2019 and 2020
If you re-use the MS MARCO indices, please additionally cite:
@inproceedings{craswell:2019,
author = {Nick Craswell and Bhaskar Mitra and Emine Yilmaz and Daniel Campos and Ellen M. Voorhees},
booktitle = {28th International Text Retrieval Conference, {TREC} 2019, Gaithersburg, Maryland, USA},
editor = {{Ellen M.} Voorhees and Angela Ellis},
month = nov,
title = {{Overview of the {TREC} 2019 Deep Learning Track}},
publisher = {National Institute of Standards and Technology (NIST)},
series = {NIST Special Publication},
year = {2019}
}
@inproceedings{craswell:2020,
author = {Nick Craswell and Bhaskar Mitra and Emine Yilmaz and Daniel Campos},
editor = {Ellen M. Voorhees and Angela Ellis},
title = {{Overview of the {TREC} 2020 Deep Learning Track}},
booktitle = {Proceedings of the 29th Text REtrieval Conference, {TREC} 2020, Virtual Event, Gaithersburg, MD, USA, November 16-20, 2020},
series = {{NIST} Special Publication},
volume = {1266},
publisher = {National Institute of Standards and Technology {(NIST)}},
year = {2020},
}
NFCorpus
If you re-use the next indices, please additionally cite:
@inproceedings{boteva:2016,
author = {Vera Boteva and Demian Gholipour Ghalandari and Artem Sokolov and Stefan Riezler},
editor = {Nicola Ferro and Fabio Crestani and Marie{-}Francine Moens and Josiane Mothe and Fabrizio Silvestri and Giorgio Maria Di Nunzio and Claudia Hauff and Gianmaria Silvello},
title = {A Full-Text Learning to Rank Dataset for Medical Information Retrieval},
booktitle = {Advances in Information Retrieval - 38th European Conference on {IR} Research, {ECIR} 2016, Padua, Italy, March 20-23, 2016. Proceedings},
series = {Lecture Notes in Computer Science},
volume = {9626},
pages = {716--722},
publisher = {Springer},
year = {2016},
}
LongEval
Please cite the [corresponding dataset](https://lindat.mff.cuni.cz/repository/xmlui/handle/11234/1-5151).:
@misc{11234/1-5151, title = {{LongEval} Click-Model Relevance Judgements (Qrels)}, author = {Galu{\v s}{\v c}{\'a}kov{\'a}, Petra and Devaud, Romain and Gonzalez-Saez, Gabriela and Mulhem, Philippe and Goeuriot, Lorraine and Piroi, Florina and Popel, Martin}, url = {http://hdl.handle.net/11234/1-5151}, note = {{LINDAT}/{CLARIAH}-{CZ} digital library at the Institute of Formal and Applied Linguistics ({{\'U}FAL}), Faculty of Mathematics and Physics, Charles University}, copyright = {Qwant {LongEval} Attribution-{NonCommercial}-{ShareAlike} License}, year = {2023} }
The index can be re-used in the [LongEval 2024](https://clef-longeval.github.io/) shared task hosted at [CLEF 2024](https://clef2024.imag.fr/). The documents (and thereby the derived PyTerrier Index are under the Qwant LongEval Attribution-NonCommercial-ShareAlike License and by reusing the indices you also accept and aggree to do this under the sharealike qwant license.
Files
2024-03-19-17-50-12.zip
Files
(1.7 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:07d8fd9ab63569f6ea2a80aee86db55a
|
55.8 kB | Preview Download |
|
md5:6698ba27029448dac2a8a7293d0376a2
|
20.0 MB | Preview Download |
|
md5:f22fdcb8c25255e64a2380de42414689
|
20.0 MB | Preview Download |
|
md5:b019d109841dfce4db65bb315750024b
|
17.6 MB | Preview Download |
|
md5:00d788fd9ccc6eba51558800ba94c731
|
460.2 kB | Preview Download |
|
md5:44504a96a16e4e4eaef760025cb9c91b
|
63.7 MB | Preview Download |
|
md5:e23d2535cc67c7bafe513c40510da6fa
|
95.1 MB | Preview Download |
|
md5:9b97188e7d7383d1cea4dc6630c37081
|
63.7 MB | Preview Download |
|
md5:047e5276a69ab7353f584ab08e817539
|
217.0 kB | Preview Download |
|
md5:1895e01418d7170ac02be6ac1a9a185f
|
43.7 MB | Preview Download |
|
md5:ea47bb4a0f7d8999db1e510d45096214
|
64.7 MB | Preview Download |
|
md5:365ff525cca8302608c2df113eaad170
|
7.8 MB | Preview Download |
|
md5:3d07a6c1364534a3c62825316703845a
|
373.7 MB | Preview Download |
|
md5:93d2166f8498bb664afc1782ffdf2106
|
507.3 MB | Preview Download |
|
md5:290a3fb49d516b4dee3c1ddcf61c0884
|
97.7 MB | Preview Download |
|
md5:467241170d83d8320df5207a20c95454
|
66.9 MB | Preview Download |
|
md5:cb813f8d7193b436132e035613f983d4
|
102.3 MB | Preview Download |
|
md5:4c6a959deadcbd8a4327c10270ff5b63
|
97.7 MB | Preview Download |
|
md5:0fe6df76b5fd121c395a5008c0485447
|
28.4 MB | Preview Download |
Additional details
Software
- Repository URL
- https://github.com/OpenWebSearch/wows-code/tree/main/ecir24
- Programming language
- Python
- Development Status
- Active