Published November 25, 2022 | Version v1
Dataset Open

Wikidata dump extension (enwiki section links)

  • 1. University of Fribourg, Switzerland
  • 2. New York University, Abu Dhabi, UAE

Description

The dataset contains mappings between Wikidata entities and Wikipedia sections. The mappings come in addition to the existing Wikidata sitelinks referencing Wikipedia pages.

The creation of the present dataset stems from the observation that only a fraction of Wikidata entities has a corresponding Wikipedia article in any language (we refer to the remaining entities, without an article, as orphans). However, a substantial number of orphan entities are indeed available in Wikipedia, but not at the page level; orphan entities can be described within existing Wikipedia articles in the form of sections, subsections, and paragraphs of a more generic concept or fact. The dataset provides a fine-grained mapping between Wikidata orphan entities and Wikipedia (sub)-sections.

Mappings are provided for English language.

The dataset is available in JSON and RDF formats and complies with the Wikibase data model.

In the JSON representation, an entity contains two fields: id (the unique identifier of an entity) and sectionlinks (links to Wikipedia sections). Each sectionlink record comprises a list of records1 with three fields: site, title, and url. A section title is appended to the page title separated with # symbol. Such a compound title is then URL-encoded and added to the URL path. Following the Wikidata guidelines, each entity is encoded as a single line.

Example:

{
    "id": "Q715509",
    "sectionlinks": {
        "enwiki": [
            {
               "site": "enwiki",
               "title": "Places in Harry Potter#Azkaban",
               "url": "https://en.wikipedia.org/wiki/Places_in_Harry_Potter#Azkaban"
            }
        ],
    }
}

 

The RDF dump is serialized using the Turtle format and stores nodes describing Wikipedia links. Section titles are added in the same manner as described above.

Example:

<https://en.wikipedia.org/wiki/Places_in_Harry_Potter#Azkaban> a schema:Article ;
        schema:about wd:Q715509 ;
        schema:inLanguage "en" ;
        schema:isPartOf <https://en.wikipedia.org/> ;
        schema:name "Places in Harry Potter#Azkaban"@en .

<https://en.wikipedia.org/> wikibase:wikiGroup "wikipedia" .

 

1 As opposed to sitelinks, where each entity can be mapped with a unique Wikipedia page (one-to-one mapping), in sectionlinks we  allow a one-to-many mapping, i.e., an entity can be mapped to multiple sections. For example, Tennis racket concept can be mapped to Tennis#Rackets and Racket (sports equipment)#Tennis sections.

Files

Files (11.1 MB)

Name Size Download all
md5:d26115f5b4b00817d18f2ec95b63ddb7
5.5 MB Download
md5:54ff71ec3e17d59c51842f501e0ae879
5.6 MB Download