Dataset Open Access

Luminoso Input Data for SemEval-2018 Task 10: "Capturing Discriminative Attributes"

Speer, Robert; Lowry-Duda, Joanna

This is the data required to run Luminoso's entry to the SemEval-2018 task on Capturing Discriminative Attributes.

This data includes:

  • A recently-computed version of the ConceptNet Numberbatch word embeddings
  • The output of an implementation of Semantic Matching Energy over ConceptNet
  • A SQLite database containing the lead section of all articles on the English Wikipedia on 2017-12-20
  • The text file that that database is constructed from
  • A SQLite database of words that co-occur in Google Books 2-grams
  • The text file containing total counts of 2-grams in the Google Books data, which that database is constructed from

For more information, see the paper "Luminoso at SemEval-2018 Task 10: Distinguishing Attributes Using Text Corpora and Relational Knowledge", by Robert Speer and Joanna Lowry-Duda, to appear in the proceedings of the SemEval workshop at NAACL 2018.

Files (8.8 GB)
Name Size
luminoso-semeval2018-task10-data.zip
md5:ac4dc9c7df25068f47fe8e97cf6f4a37
8.8 GB Download
66
25
views
downloads
All versions This version
Views 6666
Downloads 2525
Data volume 219.3 GB219.3 GB
Unique views 6060
Unique downloads 1414

Share

Cite as