Dataset Open Access
Potthast, Martin;
Hagen, Matthias;
Völske, Michael;
Gomoll, Jakob;
Stein, Benno
<?xml version='1.0' encoding='UTF-8'?> <record xmlns="http://www.loc.gov/MARC21/slim"> <leader>00000nmm##2200000uu#4500</leader> <datafield tag="999" ind1="C" ind2="5"> <subfield code="x">Potthast, M., Hagen, M., Völske, M., and Stein, B. (2013). Crowdsourcing Interaction Logs to Understand Text Reuse from the Web. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL 13), P. Fung, and M. Poesio, eds. (Association for Computational Linguistics), pp. 1212–1221.</subfield> </datafield> <datafield tag="999" ind1="C" ind2="5"> <subfield code="x">Potthast, M., Hagen, M., Völske, M., and Stein, B. (2013). Exploratory Search Missions for TREC Topics. In 3rd European Workshop on Human-Computer Interaction and Information Retrieval (EuroHCIR 2013), M.L. Wilson, T. Russell-Rose, B. Larsen, P. Hansen, and K. Norling, eds. (CEUR-WS.org), pp. 11–14.</subfield> </datafield> <datafield tag="999" ind1="C" ind2="5"> <subfield code="x">Hagen, M., Potthast, M., Völske, M., Gomoll, J., and Stein, B. (2016). How Writers Search: Analyzing the Search and Writing Logs of Non-fictional Essays. In Proceedings of the 1st ACM SIGIR Conference on Human Information Interaction and Retrieval (CHIIR 16), D. Kelly, R. Capra, N. Belkin, J. Teevan, and P. Vakkari, eds. (ACM), pp. 193–202.</subfield> </datafield> <datafield tag="041" ind1=" " ind2=" "> <subfield code="a">eng</subfield> </datafield> <controlfield tag="005">20200124192503.0</controlfield> <controlfield tag="001">1341602</controlfield> <datafield tag="711" ind1=" " ind2=" "> <subfield code="d">August 2013</subfield> <subfield code="g">ACL 2013</subfield> <subfield code="a">51st Annual Meeting of the Association for Computational Linguistics</subfield> <subfield code="c">Sofia, Bulgaria</subfield> </datafield> <datafield tag="700" ind1=" " ind2=" "> <subfield code="u">Bauhaus-University Weimar</subfield> <subfield code="0">(orcid)0000-0002-9733-2890</subfield> <subfield code="a">Hagen, Matthias</subfield> </datafield> <datafield tag="700" ind1=" " ind2=" "> <subfield code="u">Bauhaus-University Weimar</subfield> <subfield code="0">(orcid)0000-0002-9283-6846</subfield> <subfield code="a">Völske, Michael</subfield> </datafield> <datafield tag="700" ind1=" " ind2=" "> <subfield code="u">Bauhaus-University Weimar</subfield> <subfield code="a">Gomoll, Jakob</subfield> </datafield> <datafield tag="700" ind1=" " ind2=" "> <subfield code="u">Bauhaus-University Weimar</subfield> <subfield code="0">(orcid)0000-0001-9033-2217</subfield> <subfield code="a">Stein, Benno</subfield> </datafield> <datafield tag="856" ind1="4" ind2=" "> <subfield code="s">120705808</subfield> <subfield code="z">md5:d5ad8bf12a335b2bc38c8d9db5ffbf0a</subfield> <subfield code="u">https://zenodo.org/record/1341602/files/corpus-webis-trc-12.tar.xz</subfield> </datafield> <datafield tag="542" ind1=" " ind2=" "> <subfield code="l">open</subfield> </datafield> <datafield tag="260" ind1=" " ind2=" "> <subfield code="c">2012-09-18</subfield> </datafield> <datafield tag="909" ind1="C" ind2="O"> <subfield code="p">openaire_data</subfield> <subfield code="p">user-webis</subfield> <subfield code="o">oai:zenodo.org:1341602</subfield> </datafield> <datafield tag="100" ind1=" " ind2=" "> <subfield code="u">Bauhaus-University Weimar</subfield> <subfield code="0">(orcid)0000-0003-2451-0665</subfield> <subfield code="a">Potthast, Martin</subfield> </datafield> <datafield tag="245" ind1=" " ind2=" "> <subfield code="a">Webis Text Reuse Corpus 2012</subfield> </datafield> <datafield tag="980" ind1=" " ind2=" "> <subfield code="a">user-webis</subfield> </datafield> <datafield tag="540" ind1=" " ind2=" "> <subfield code="u">https://creativecommons.org/licenses/by-sa/4.0/legalcode</subfield> <subfield code="a">Creative Commons Attribution Share Alike 4.0 International</subfield> </datafield> <datafield tag="650" ind1="1" ind2="7"> <subfield code="a">cc-by</subfield> <subfield code="2">opendefinition.org</subfield> </datafield> <datafield tag="520" ind1=" " ind2=" "> <subfield code="a"><p>The Webis Text Reuse Corpus 2012 (Webis-TRC-12) compiles manually written documents obtained from a completely controlled, yet representative environment that emulates the web. Each document in the corpus is about one of the 150 topics used at the TREC Web Tracks 2009&ndash;2011, thus forming a strong connection with existing evaluation efforts. Writers, hired at the crowdsourcing platform oDesk, had to retrieve sources for a given topic and to reuse text from what they found. Part of the corpus are detailed interaction logs that consistently cover the search for sources as well as the creation of documents. This will allow for in-depth analyses of how text is composed if a writer is at liberty to reuse texts from a third party.</p> <p>&nbsp;</p></subfield> </datafield> <datafield tag="773" ind1=" " ind2=" "> <subfield code="n">doi</subfield> <subfield code="i">isVersionOf</subfield> <subfield code="a">10.5281/zenodo.1341601</subfield> </datafield> <datafield tag="024" ind1=" " ind2=" "> <subfield code="a">10.5281/zenodo.1341602</subfield> <subfield code="2">doi</subfield> </datafield> <datafield tag="024" ind1=" " ind2=" "> <subfield code="q">alternateidentifier</subfield> <subfield code="a">https://webis.de/data/webis-trc-12.html</subfield> <subfield code="2">url</subfield> </datafield> <datafield tag="980" ind1=" " ind2=" "> <subfield code="a">dataset</subfield> </datafield> </record>
All versions | This version | |
---|---|---|
Views | 541 | 537 |
Downloads | 261 | 261 |
Data volume | 31.5 GB | 31.5 GB |
Unique views | 489 | 485 |
Unique downloads | 174 | 174 |