Published January 24, 2020 | Version 1.0
Dataset Open

Resources for reproducing experiments in "Novel Entity Discovery from Web Tables"

  • 1. Bloomberg
  • 2. University of Stavanger

Description

This repository contains resources developed for the paper: "S. Zhang, E. Meij, K. Balog, and R. Reinanda. Novel Entity Discovery from Web Tables. In: Proceeding of the The Web Conference 2020 (WWW ’20), April 2020".

It includes the three test collections for novel entity discovery for Web tables, entity type and mention resolution, as well as the mention-entity and heading-property correspondences for 3M tables. The cited datasets were used in this work.

Files to recreate the entity linking experiments:

  • training_el.csv
  • training_el_type.csv
  • training_el_type_wiki.csv
  • training_el_wiki.csv
  • training_schema.csv

Files to recreate the table matching experiments:

  • me_corres.csv - textual cells algorithmically linked to Wikipedia entities
  • hp_corres.csv - same but only table headings

Files to recreate the entity resolution experiments:

  • ec_golden.csv - 20K unlinked mentions textual cells, manually linked to Wikipedia
  • er_sf_golden.csv - 1K cell values, manually clustered
  • er_type_golden.csv - 1K cell values, manually linked to DBpedia types

Files

www2020-webtables-v1.0.zip

Files (466.1 MB)

Name Size Download all
md5:f389e09f86080d83f76ff24d777d9b7f
466.1 MB Preview Download

Additional details