Wikary: A Dataset of N-ary Wikipedia Tables Matched to Qualified Wikidata Statements

Mazurek, Igor; Wiewel, Berend; Kruit, Benno

doi:10.5281/zenodo.7025005

Published August 25, 2022 | Version v1

Dataset Open

Wikary: A Dataset of N-ary Wikipedia Tables Matched to Qualified Wikidata Statements

Wikary: A Dataset of N-ary Wikipedia Tables Matched to Qualified Wikidata Statements

Created for The SemTab 2022 Datasets Track challenge.

General

Explanation of columns names used in both files

`lang` - language and Wikipedia version used

`pageTitle` - page title

`tableIndex` - index where the given table is located on the page

Tables file

Columns names are used only in the tables file

`pageEntity` - Wikidata entity associated with the page

`sectionTitle` - the title of the section where the table is located

`tableCaption` - caption of the table

`headers` - headers of the table

`HTML` - HTML of the table

Matches file

Columns names are used only in the matches file

`rowIndex` - index of a row where the match is found for a given table

`wikidata_ids` - Wikidata entities in the row including `pageEntity`

`entities_index` - indexes in which cell Wikidata entities were found, -1 used for Wikidata entity associated with the page, -9 used for cells that include a date in a cell

`entities_anchor` - anchor text of cells where Wikidata entities were found

`entities_cell_text` - cell text of cells where Wikidata entities were found

`subject` - subject of Wikidata statement

`property` - property of Wikidata statement

`object` - object of Wikidata statement

`property_qualifier` - property qualifier of Wikidata statement

`qualifier_value` - qualifier value of Wikidata statement

`id_match` - 1 means that row contains *Wikidata identifier match*, 0 means no match

`year_match` - 1 means that row contains *Year cell match*, 0 means no match

`year_part_match` - 1 means that row contains *Within cell year match*, 0 means no match

Files

README.md

Files (379.2 MB)

Name	Size
README.md md5:1eb7ed7205fe570ba7b552dbe8b68c8d	1.7 kB	Preview Download
wikary_matches.csv md5:1831c890c537f80421391b1bd8a693be	103.6 MB	Preview Download
wikary_nary_tables.csv md5:9d0e02349a638a20e417faece9d2f8f1	275.6 MB	Preview Download

	All versions	This version
Views	300	300
Downloads	628	628
Data volume	94.7 GB	94.7 GB

Wikary: A Dataset of N-ary Wikipedia Tables Matched to Qualified Wikidata Statements

Authors/Creators

Description

Files

README.md

Files (379.2 MB)