Dataset Open Access
Potthast, Martin;
Stein, Benno;
Eiselt, Andreas;
Barrón-Cedeño, Alberto;
Rosso, Paolo
<?xml version='1.0' encoding='UTF-8'?> <record xmlns="http://www.loc.gov/MARC21/slim"> <leader>00000nmm##2200000uu#4500</leader> <datafield tag="999" ind1="C" ind2="5"> <subfield code="x">Alberto Barrón-Cedeño, Martin Potthast, Paolo Rosso, Benno Stein, and Andreas Eiselt. Corpus and Evaluation Measures for Automatic Plagiarism Detection. In Nicoletta Calzolari et al, editors, 7th Conference on International Language Resources and Evaluation (LREC 10), May 2010. European Language Resources Association (ELRA). ISBN 2-9517408-6-7.</subfield> </datafield> <datafield tag="041" ind1=" " ind2=" "> <subfield code="a">eng</subfield> </datafield> <datafield tag="653" ind1=" " ind2=" "> <subfield code="a">plagiarism</subfield> </datafield> <datafield tag="653" ind1=" " ind2=" "> <subfield code="a">plagiarism detection</subfield> </datafield> <datafield tag="653" ind1=" " ind2=" "> <subfield code="a">documents</subfield> </datafield> <datafield tag="653" ind1=" " ind2=" "> <subfield code="a">PAN</subfield> </datafield> <datafield tag="653" ind1=" " ind2=" "> <subfield code="a">2010</subfield> </datafield> <controlfield tag="005">20200124192558.0</controlfield> <controlfield tag="001">3250123</controlfield> <datafield tag="700" ind1=" " ind2=" "> <subfield code="u">Bauhaus-Universität Weimar</subfield> <subfield code="0">(orcid)0000-0001-9033-2217</subfield> <subfield code="a">Stein, Benno</subfield> </datafield> <datafield tag="700" ind1=" " ind2=" "> <subfield code="u">Bauhaus-Universität Weimar</subfield> <subfield code="a">Eiselt, Andreas</subfield> </datafield> <datafield tag="700" ind1=" " ind2=" "> <subfield code="u">Universidad Polytécnica de Valencia</subfield> <subfield code="a">Barrón-Cedeño, Alberto</subfield> </datafield> <datafield tag="700" ind1=" " ind2=" "> <subfield code="u">Universidad Polytécnica de Valencia</subfield> <subfield code="a">Rosso, Paolo</subfield> </datafield> <datafield tag="856" ind1="4" ind2=" "> <subfield code="s">1073741824</subfield> <subfield code="z">md5:66e4f2801f097da2c1537453d6edf4ee</subfield> <subfield code="u">https://zenodo.org/record/3250123/files/pan-plagiarism-corpus-2010.part1.rar</subfield> </datafield> <datafield tag="856" ind1="4" ind2=" "> <subfield code="s">700199388</subfield> <subfield code="z">md5:629861d970aeda647ff7b7c4c1cc70f4</subfield> <subfield code="u">https://zenodo.org/record/3250123/files/pan-plagiarism-corpus-2010.part2.rar</subfield> </datafield> <datafield tag="542" ind1=" " ind2=" "> <subfield code="l">open</subfield> </datafield> <datafield tag="260" ind1=" " ind2=" "> <subfield code="c">2010-05-01</subfield> </datafield> <datafield tag="909" ind1="C" ind2="O"> <subfield code="p">openaire_data</subfield> <subfield code="p">user-pan</subfield> <subfield code="p">user-webis</subfield> <subfield code="o">oai:zenodo.org:3250123</subfield> </datafield> <datafield tag="100" ind1=" " ind2=" "> <subfield code="u">Bauhaus-Universität Weimar</subfield> <subfield code="0">(orcid)0000-0003-2451-0665</subfield> <subfield code="a">Potthast, Martin</subfield> </datafield> <datafield tag="245" ind1=" " ind2=" "> <subfield code="a">PAN Plagiarism Corpus 2010 (PAN-PC-10)</subfield> </datafield> <datafield tag="980" ind1=" " ind2=" "> <subfield code="a">user-pan</subfield> </datafield> <datafield tag="980" ind1=" " ind2=" "> <subfield code="a">user-webis</subfield> </datafield> <datafield tag="540" ind1=" " ind2=" "> <subfield code="u">https://creativecommons.org/licenses/by/4.0/legalcode</subfield> <subfield code="a">Creative Commons Attribution 4.0 International</subfield> </datafield> <datafield tag="650" ind1="1" ind2="7"> <subfield code="a">cc-by</subfield> <subfield code="2">opendefinition.org</subfield> </datafield> <datafield tag="520" ind1=" " ind2=" "> <subfield code="a"><p>This corpus is outdated. Please use its successor PAN-PC-11: https://doi.org/10.5281/zenodo.3250095</p> <p>The PAN plagiarism corpus 2010 (PAN-PC-10) is a corpus for the evaluation of automatic plagiarism detection algorithms. For research purposes the corpus can be used free of charge.</p> <p>The PAN-PC-10 contains documents in which artificial plagiarism has been inserted automatically as well as documents in which simulated plagiarism has been inserted manually. The former have been constructed using a so-called random plagiarist, a computer program which constructs plagiarism according to a number of parameters, while the latter have been obtained with crowdsourcing via Amazon&#39;s Mechanical Turk.</p></subfield> </datafield> <datafield tag="773" ind1=" " ind2=" "> <subfield code="n">doi</subfield> <subfield code="i">isVersionOf</subfield> <subfield code="a">10.5281/zenodo.3250122</subfield> </datafield> <datafield tag="024" ind1=" " ind2=" "> <subfield code="a">10.5281/zenodo.3250123</subfield> <subfield code="2">doi</subfield> </datafield> <datafield tag="980" ind1=" " ind2=" "> <subfield code="a">dataset</subfield> </datafield> </record>
All versions | This version | |
---|---|---|
Views | 625 | 626 |
Downloads | 426 | 426 |
Data volume | 388.3 GB | 388.3 GB |
Unique views | 571 | 572 |
Unique downloads | 191 | 191 |