There is a newer version of the record available.

Published March 19, 2020 | Version v1
Dataset Open

Understanding and Predicting Edit War Articles on Wikipedia

  • 1. Graz University of Technology

Description

This dataset was analyzed and produced during the study described in the paper "Understanding and Predicting Edit War Articles on Wikipedia" (under review, doi and link follows - see references). Its creation process and use cases are described in the dedicated paper.

For directions and code to process and evaluate this data, please see the corresponding GitHub repository: https://github.com/ruptho/editlinkquality-wikipedia.

We provide three subsets for 4800 Wikipedia articles (in .pkl format):
The "article_revisions_labeled.pkl" file provides the final, semantically labeled revisions for each analyzed article per quality category. The "article_revision_features.zip" file contains processed per-article features, divided into folders for the specific quality categories they belong to. In "article_revision_features_raw.zip", we provide the raw features as retrieved via RevScoring API (https://pythonhosted.org/revscoring/).

Files

article_revision_features.zip

Files (17.4 GB)

Name Size Download all
md5:f39d23469f39cd226b6fe6863539421b
1.7 GB Preview Download
md5:69e3c38cf5415608b04f670d5f4f2095
15.2 GB Preview Download
md5:9eda636b0d85f0bd64bddb9a0f1e5e60
565.6 MB Download

Additional details

References

  • Understanding and Predicting Edit Wars on Wikipedia (2020, Under Review). Thorsten Ruprechter, Tiago Santos, Denis Helic. Applied Network Science.