10.5281/zenodo.5575285
https://zenodo.org/records/5575285
oai:zenodo.org:5575285
Gienapp, Lukas
Lukas
Gienapp
0000-0001-5707-3751
Leipzig University
Kircheis, Wolfgang
Wolfgang
Kircheis
0000-0002-0925-0503
Leipzig University
Sievers, Bjarne
Bjarne
Sievers
0000-0001-7763-7075
Leipzig University
Stein, Benno
Benno
Stein
Bauhaus-Universität Weimar
Potthast, Martin
Martin
Potthast
0000-0003-2451-0665
Leipzig University
Webis-STEREO-21
Zenodo
2021
2021-10-18
eng
arXiv:2112.11800
10.5281/zenodo.5575284
https://zenodo.org/communities/webis
1.0.0
Creative Commons Attribution 4.0 International
We present the Webis-STEREO-21 dataset, a massive collection of Scientific Text Reuse in Open-access publications. It contains more than 91 million cases of reused text passages found in 4.2 million unique open-access publications. Featuring a high coverage of scientific disciplines and varieties of reuse, as well as comprehensive metadata to contextualize each case, our dataset addresses the most salient shortcomings of previous ones on scientific writing. Webis-STEREO-21 allows for tackling a wide range of research questions from different scientific backgrounds, facilitating both qualitative and quantitative analysis of the phenomenon as well as a first-time grounding on the base rate of text reuse in scientific publications.
This is the open-access version of the dataset, which includes only the metadata of each reuse case. Due to licensing issues, the matched text is not included.