Published August 25, 2022 | Version v1
Dataset Open

Wikary: A Dataset of N-ary Wikipedia Tables Matched to Qualified Wikidata Statements

Description

Wikary: A Dataset of N-ary Wikipedia Tables Matched to Qualified Wikidata Statements

Created for The SemTab 2022 Datasets Track challenge.

General

Explanation of columns names used in both files

`lang` - language and Wikipedia version used

`pageTitle` - page title

`tableIndex` -  index where the given table is located on the page

Tables file

Columns names are used only in the tables file

`pageEntity` - Wikidata entity associated with the page

`sectionTitle` - the title of the section where the table is located

`tableCaption` - caption of the table

`headers` - headers of the table

`HTML` - HTML of the table

Matches file

Columns names are used only in the matches file

`rowIndex` - index of a row where the match is found for a given table

`wikidata_ids` - Wikidata entities in the row including `pageEntity`

`entities_index` -  indexes in which cell Wikidata entities were found, -1 used for Wikidata entity associated with the page, -9 used for cells that include a date in a cell

`entities_anchor` -  anchor text of cells where Wikidata entities were found

`entities_cell_text` -  cell text of cells where Wikidata entities were found

`subject` - subject of Wikidata statement

`property` - property of Wikidata statement

`object` - object of Wikidata statement

`property_qualifier` - property qualifier of Wikidata statement

`qualifier_value` - qualifier value of Wikidata statement

`id_match` - 1 means that row contains *Wikidata identifier match*, 0 means no match

`year_match` - 1 means that row contains *Year cell match*, 0 means no match

`year_part_match` - 1 means that row contains *Within cell year match*, 0 means no match

 

Files

README.md

Files (379.2 MB)

Name Size Download all
md5:1eb7ed7205fe570ba7b552dbe8b68c8d
1.7 kB Preview Download
md5:1831c890c537f80421391b1bd8a693be
103.6 MB Preview Download
md5:9d0e02349a638a20e417faece9d2f8f1
275.6 MB Preview Download