Published June 28, 2023 | Version v3
Conference paper Open

Knowledge Base Completion for Long-Tail Entities

  • 1. Institut Polytechnique de Paris
  • 2. Max Planck Institute for Informatics

Description

We developed a new dataset with an emphasis on the long-tail challenge,
called MALT (for “Multi-token, Ambiguous, Long-Tailed facts”).
The dataset contains 65.3% triple facts where the O entity is a multi-word phrase, and 58.6%
ambiguous facts where the S or O entities share identical alias names in Wikidata.

For example, the two ambiguous entities ,“Birmingham, West
Midlands (Q2256)” and “Birmingham, Alabama (Q79867)”, have the same Label value “BirminghamBirmingham”.
In total, 87.0% of the sample facts have entities in the long tail, where we define long-tail entities to have at most 13 Wikidata triples.

Files

MALT.zip

Files (217.5 MB)

Name Size Download all
md5:212743f425f0d538600daad2b0ac6e40
217.5 MB Preview Download