Dataset Open Access
Potthast, Martin;
Stein, Benno;
Eiselt, Andreas;
Barrón-Cedeño, Alberto;
Rosso, Paolo
The PAN plagiarism corpus 2011 (PAN-PC-11) is a corpus for the evaluation of automatic plagiarism detection algorithms. For research purposes the corpus can be used free of charge.
The PAN-PC-11 contains documents in which plagiarism has been inserted automatically as well as documents in which plagiarism has been inserted manually. The former have been constructed using a so-called random plagiarist, a computer program which constructs plagiarism according to a number of parameters, while the latter have been obtained with crowdsourcing via Amazon's Mechanical Turk.
Name | Size | |
---|---|---|
pan-plagiarism-corpus-2011.part1.rar
md5:b2930f859497dd48ba5bb606d3f4a4f3 |
1.0 GB | Download |
pan-plagiarism-corpus-2011.part2.rar
md5:b23d86c17a47d2bfbdc4c314ea5810df |
703.9 MB | Download |
Benno Stein, Martin Potthast, Alberto Barrón-Cedeño, Paolo Rosso, Efstathios Stamatatos, and Moshe Koppel. 4th International Workshop on Uncovering Plagiarism, Authorship, and Social Software Misuse (PAN 2010). SIGIR Forum, 45 (1) : 45-48, June 2011.
All versions | This version | |
---|---|---|
Views | 1,715 | 1,716 |
Downloads | 2,297 | 2,297 |
Data volume | 2.1 TB | 2.1 TB |
Unique views | 1,495 | 1,496 |
Unique downloads | 962 | 962 |