Dataset Open Access

Webis Wikipedia Vandalism Corpus (Webis-WVC-07)

Potthast, Martin; Gerling, Robert; Stein, Benno

This corpus is outdated. Please use its successors PAN-WVC-10 and PAN-WVC-11.

The Webis Wikipedia Vandalism Corpus (Webis-WVC-07) is a corpus for the evaluation of automatic vandalism detection algorithms for Wikipedia. For research purposes the corpus can be used free of charge.

The corpus is the first standardized test collection for the comparison of vandalism detection algorithms. It comprises 940 edits from which 301 are marked as vandalism by human evaluators.

Files (10.4 kB)
Name Size
webis-wikipedia-vandalism-corpus-2007.zip
md5:d3d82a4f90013dc333bb805b0a227433
10.4 kB Download
  • Martin Potthast, Benno Stein, and Robert Gerling. Automatic Vandalism Detection in Wikipedia. In Craig Macdonald et al, editors, Advances in Information Retrieval. 30th European Conference on IR Research (ECIR 2008) volume 4956 of Lecture Notes in Computer Science, pages 663-668, Berlin Heidelberg New York, 2008. Springer. ISBN 978-3-540-78645-0. ISSN 0302-9743.

223
14
views
downloads
All versions This version
Views 223222
Downloads 1414
Data volume 144.9 kB144.9 kB
Unique views 209208
Unique downloads 1313

Share

Cite as