Info: Zenodo’s user support line is staffed on regular business days between Dec 23 and Jan 5. Response times may be slightly longer than normal.

Published June 4, 2022 | Version 1.0.0-beta
Dataset Open

Wikidata subset with revision history information [JSON]

  • 1. University of Oviedo

Description

This dataset consists the complete revision history of every instance of the 100 most important classes in Wikidata. It contains 9.3 million classes and around 450 million revisions made to those classes. This dataset was exported from a MongoDB database. After decompressing the files, the resulting JSON files can be imported into MongoDB using the following commands:

mongoimport --db=db_name --collection=wd_entities --file=wd_entities.json
mongoimport --db=db_name --collection=wd_revisions --file=wd_revisions.json

Make sure that db_name is replaced by the database where this data will be imported.

Documents within the wd_entities collection have the following schema:

  • id: Internal id of the entity used by Wikidata (e.g. 8195238).
  • entity_id: Public id of the entity in Wikidata (e.g. 'Q42')
  • class_ids: List of classes that the entity belongs to (e.g. ['Q5', 'Q100'])
  • entity_json: JSON contents of the entity, following Wikidata's JSON data model (https://doc.wikimedia.org/Wikibase/master/php/md_docs_topics_json.html).

Documents within the wd_revisions collection have the following schema:

  • id: Identifier of the revision (e.g. 15921539)
  • entity_id: Public id of the entity in Wikidata affected by this revision (e.g. 'Q42')
  • class_ids: List of classes that the entity affected by this revision belongs to (e.g. ['Q5', 'Q100'])
  • parent_id: Identifier of the previous revision to this one, if it exists (e.g. 15921214)
  • timestamp: Date where the revision was made, following the ISO 8601 format (e.g. +2019-05-27T09:31:10Z)
  • username: Username of the user that made the revision.
  • comment: Comments made by the user in the revision, if any.
  • entity_diff: List of operations made in this revision, following the JSON Patch format.

Files

Files (33.8 GB)

Name Size Download all
md5:bb9fb776d4972d5b984774160edbd7ab
9.3 GB Download
md5:153214d7dc6bcbc41ac53ddb574748fa
24.4 GB Download

Additional details

Related works

Is supplemented by
Dataset: 10.5281/zenodo.6613875 (DOI)