Dataset Open Access

Relating Wikipedia Article Quality to Edit Behavior and Link Structure

Thorsten Ruprechter

This dataset was analyzed and produced during the study described in the paper "Relating Wikipedia Article Quality to Edit Behavior and Link Structure" (under review, doi and link follows - see references). Its creation process and use cases are described in the dedicated paper.

For directions and code to process and evaluate this data, please see the corresponding GitHub repository: https://github.com/ruptho/editlinkquality-wikipedia.

We provide three files for 4941 Wikipedia articles (in .pkl format):
The "article_revisions_labeled.pkl" file provides the final, semantically labeled revisions for each analyzed article per quality category. The "article_revision_features.zip" file contains processed per-article features, divided into folders for the specific quality categories they belong to. In "article_revision_features_raw.zip", we provide the raw features as retrieved via RevScoring API (https://pythonhosted.org/revscoring/).

Files (20.3 GB)
Name Size
article_revision_features.zip
md5:9a56d7c7165fc8d19e9dbc5d814d5262
1.9 GB Download
article_revision_features_raw.zip
md5:2ec3e4edf1af716cc643452e4c638214
17.8 GB Download
article_revisions_labeled.pkl
md5:c8971490f1a1ba633f0d59389eccd7cd
629.7 MB Download
  • Relating Wikipedia Article Quality to Edit Behavior and Link Structure (2020, Under Review). Thorsten Ruprechter, Tiago Santos, Denis Helic. Applied Network Science.

94
16
views
downloads
All versions This version
Views 9462
Downloads 1615
Data volume 83.5 GB82.9 GB
Unique views 6952
Unique downloads 1110

Share

Cite as