Dataset Open Access

PAN Plagiarism Corpus 2010 (PAN-PC-10)

Potthast, Martin; Stein, Benno; Eiselt, Andreas; Barrón-Cedeño, Alberto; Rosso, Paolo


Citation Style Language JSON Export

{
  "publisher": "Zenodo", 
  "DOI": "10.5281/zenodo.3250123", 
  "language": "eng", 
  "title": "PAN Plagiarism Corpus 2010 (PAN-PC-10)", 
  "issued": {
    "date-parts": [
      [
        2010, 
        5, 
        1
      ]
    ]
  }, 
  "abstract": "<p>This corpus is outdated. Please use its successor PAN-PC-11: https://doi.org/10.5281/zenodo.3250095</p>\n\n<p>The PAN plagiarism corpus 2010 (PAN-PC-10) is a corpus for the evaluation of automatic plagiarism detection algorithms. For research purposes the corpus can be used free of charge.</p>\n\n<p>The PAN-PC-10 contains documents in which artificial plagiarism has been inserted automatically as well as documents in which simulated plagiarism has been inserted manually. The former have been constructed using a so-called random plagiarist, a computer program which constructs plagiarism according to a number of parameters, while the latter have been obtained with crowdsourcing via Amazon&#39;s Mechanical Turk.</p>", 
  "author": [
    {
      "family": "Potthast, Martin"
    }, 
    {
      "family": "Stein, Benno"
    }, 
    {
      "family": "Eiselt, Andreas"
    }, 
    {
      "family": "Barr\u00f3n-Cede\u00f1o, Alberto"
    }, 
    {
      "family": "Rosso, Paolo"
    }
  ], 
  "type": "dataset", 
  "id": "3250123"
}
625
426
views
downloads
All versions This version
Views 625626
Downloads 426426
Data volume 388.3 GB388.3 GB
Unique views 571572
Unique downloads 191191

Share

Cite as