Webis Text Reuse Corpus 2012

10.5281/zenodo.1341602 https://zenodo.org/records/1341602 oai:zenodo.org:1341602 https://webis.de/data/webis-trc-12.html Potthast, Martin Martin Potthast 0000-0003-2451-0665 Bauhaus-University Weimar Hagen, Matthias Matthias Hagen 0000-0002-9733-2890 Bauhaus-University Weimar Völske, Michael Michael Völske 0000-0002-9283-6846 Bauhaus-University Weimar Gomoll, Jakob Jakob Gomoll Bauhaus-University Weimar Stein, Benno Benno Stein 0000-0001-9033-2217 Bauhaus-University Weimar Webis Text Reuse Corpus 2012 Zenodo 2012 2012-09-18 2020-01-24 eng 10.5281/zenodo.1341601 https://zenodo.org/communities/webis Creative Commons Attribution Share Alike 4.0 International The Webis Text Reuse Corpus 2012 (Webis-TRC-12) compiles manually written documents obtained from a completely controlled, yet representative environment that emulates the web. Each document in the corpus is about one of the 150 topics used at the TREC Web Tracks 2009–2011, thus forming a strong connection with existing evaluation efforts. Writers, hired at the crowdsourcing platform oDesk, had to retrieve sources for a given topic and to reuse text from what they found. Part of the corpus are detailed interaction logs that consistently cover the search for sources as well as the creation of documents. This will allow for in-depth analyses of how text is composed if a writer is at liberty to reuse texts from a third party.