Published June 19, 2021 | Version v1.0
Dataset Open

Semantic Annotation for Tabular Data with DBpedia: Adapted SemTab 2019 with DBpedia 2016-10

  • 1. National Institute of Informatics
  • 2. National Institute of Advanced Industrial Science and Technology

Description

Semantic Annotation for Tabular Data with DBpedia: Adapted SemTab 2019 with DBpedia 2016-10

Github: https://github.com/phucty/mtab4dbpedia
---------------------------------------------------------------------------------------------------------------------------------------

CEA: 

  • Keep only valid entities in DBpedia 2016-10

  • Resolve percentage encoding

  • Add missing redirect entities

CTA: 

  • Keep only valid types

  • Resolve transitive types (parents and equivalent types of the specific type) with DBpedia ontology 2016-10

CPA:

  • Add equivalent properties

Statistic of Adapted Tabular data SemTab 2019

|         |    CEA   |         |        |    CPA   |         |        |    CTA   |         |        |
|---------|:--------:|:-------:|:------:|:--------:|:-------:|:------:|:--------:|---------|--------|
|         | Orginal  | Adapted | Change | Orginal  | Adapted | Change | Orginal  | Adapted | Change |
| Round 1 | 8418     | 8406    | -0.14% | 116      | 116     | 0.00%  | 120      | 120     | 0.00%  |
| Round 2 | 463796   | 457567  | -1.34% | 6762     | 6762    | 0.00%  | 14780    | 14333   | -3.02% |
| Round 3 | 406827   | 406820  | 0.00%  | 7575     | 7575    | 0.00%  | 5762     | 5673    | -1.54% |
| Round 4 | 107352   | 107351  | 0.00%  | 2747     | 2747    | 0.00%  | 1732     | 1717    | -0.87% |

 

---------------------------------------------------------------------------------------------------------------------------------------
DBpedia 2016-10 extra resources: (Original dataset http://downloads.dbpedia.org/2016-10/)

---------------------------------------------------------------------------------------------------------------------------------------

File: _dbpedia_classes_2016-10.csv

Information: DBpedia classes and parents: (We remove the abstract types: Agent, Thing)

Total: 759 classes

Structure: [class, parents (separate with space)] (without prefix dbo: or http://dbpedia.org/ontology/)

Example: "City","Location Place PopulatedPlace Settlement"

---------------------------------------------------------------------------------------------------------------------------------------

File: _dbpedia_properties_2016-10.csv

Information: DBpedia properties and these equivalents

Total: 2865 properties

Structure: [property, it’s equivalent properties] (without prefix dbo: or http://dbpedia.org/ontology/)

Example: "restingDate","deathDate"

---------------------------------------------------------------------------------------------------------------------------------------

File: _dbpedia_domains_2016-10.csv

Information: DBpedia properties and these domain types

Total: 2421 properties (have types as their domain)

Structure: [property, type (domain)] (without prefix dbo: or http://dbpedia.org/ontology/)

Example: "deathDate","Person"

---------------------------------------------------------------------------------------------------------------------------------------

File: _dbpedia_entities_2016-10.jsonl.bz2 

Information: DBpedia entity dump

Format: json list bz2 (bz2 Compressed json list)

Source: DBpedia dump 2016-10 core

Total: 5,289,577 entities (No disambiguation entities)

Structure:

An entity: for example “Tokyo”: (datatype: dictionary),

{

'wd': 'Q1322032', (Wikidata ID, datatype: string)

'wp': 'Tokyo', (Wikipedia ID, add prefix https://en.wikipedia.org/wiki/ + wp to get the Wikipedia URL, datatype: string)

'dp': 'Tokyo', (DBpedia ID, add prefix http://dbpedia.org/resource/ + dp to get the DBpedia URL, datatype: string)

'label': 'Tokyo', (Entity label, datatype: string)

'aliases': ['To-kyo', 'Tôkyô Prefecture', ..], (Other entity names, datatype: list) 

'aliases_multilingual': ['东京小子', 'طوكيو', ...], (Other entity names in multilingual, datatype: list)

'types_specific': 'City', (Entity direct type, datatype: string) 

'types_transitive': ['Human settlement', 'City', 'PopulatedPlace', 'Location', 'Place', 'Settlement'], (Entity transitive types, datatype: list)

'claims_entity': { (entity statements, datatype: dictionary. Keys: properties, Values: list of tail entities)

'governingBody': ['Tokyo Metropolitan Government'], 

     'subdivision': ['Honshu', 'Kantō region'],

...

},

'claims_literal': {

'string': { (String literal: datatype: dictionary. Keys: properties, Values: list of values

'postalCode': ['JP-13'], 

'utcOffset': ['+09:00', '+9'],

}

'time': { (Time literal: datatype: dictionary. Keys: properties, Values: list of date time

'populationAsOf': ['2016-07-31'], 

...

}), 

'quantity': { (Numerical literal: datatype: dictionary. Keys: properties, Values: list of values

populationDesity: [6224.66, 6349.0], 

'maximumElevation': [2017], 

...

},

'pagerank': 2.2167366040153352e-06 (Entity page rank score calculated on DBpedia Graph)

}

---------------------------------------------------------------------------------------------------------------------------------------

THIS DATA IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Notes

This data is redistributed from - SemTab 2019: https://doi.org/10.5281/zenodo.3518530 - Wikipedia https://www.wikipedia.org/ - DBpedia http://dbpedia.org/ - T2Dv2 Gold Standard for Matching Web Tables to DBpedia http://webdatacommons.org/webtables/goldstandardV2.html Please refer to the licenses from these sources.

Files

_dbpedia_classes_2016-10.csv

Files (1.4 GB)

Name Size Download all
md5:c9c2610533b87e366193566dcefadbbb
33.7 kB Preview Download
md5:9f78b253eba405e3587c90df1c02487e
72.3 kB Preview Download
md5:102dee1c3848b4206bb92e8c94f77b58
1.4 GB Download
md5:5ba557fd1e145964d4729935219feba2
57.1 kB Preview Download
md5:8501aa7ea2c615359e511c1057771743
42.0 MB Download

Additional details

Related works

Is cited by
Dataset: 10.5281/zenodo.3518539 (DOI)