Published June 1, 2011 | Version v1
Dataset Open

PAN Plagiarism Corpus 2011 (PAN-PC-11)

  • 1. Bauhaus-Universität Weimar
  • 2. Universidad Polytécnica de Valencia


The PAN plagiarism corpus 2011 (PAN-PC-11) is a corpus for the evaluation of automatic plagiarism detection algorithms. For research purposes the corpus can be used free of charge.

The PAN-PC-11 contains documents in which plagiarism has been inserted automatically as well as documents in which plagiarism has been inserted manually. The former have been constructed using a so-called random plagiarist, a computer program which constructs plagiarism according to a number of parameters, while the latter have been obtained with crowdsourcing via Amazon's Mechanical Turk.


Files (1.7 GB)

Name Size Download all
1.0 GB Download
703.9 MB Download

Additional details


  • Benno Stein, Martin Potthast, Alberto Barrón-Cedeño, Paolo Rosso, Efstathios Stamatatos, and Moshe Koppel. 4th International Workshop on Uncovering Plagiarism, Authorship, and Social Software Misuse (PAN 2010). SIGIR Forum, 45 (1) : 45-48, June 2011.