Knowledge Base Completion for Long-Tail Entities
Creators
- 1. Institut Polytechnique de Paris
- 2. Max Planck Institute for Informatics
Description
We developed a new dataset with an emphasis on the long-tail challenge,
called MALT (for “Multi-token, Ambiguous, Long-Tailed facts”).
The dataset contains 65.3% triple facts where the O entity is a multi-word phrase, and 58.6%
ambiguous facts where the S or O entities share identical alias names in Wikidata.
For example, the two ambiguous entities ,“Birmingham, West
Midlands (Q2256)” and “Birmingham, Alabama (Q79867)”, have the same Label value “BirminghamBirmingham”.
In total, 87.0% of the sample facts have entities in the long tail, where we define long-tail entities to have at most 13 Wikidata triples.
Files
MALT.zip
Files
(217.5 MB)
Name | Size | Download all |
---|---|---|
md5:212743f425f0d538600daad2b0ac6e40
|
217.5 MB | Preview Download |