Dataset Open Access

Luminoso Input Data for SemEval-2018 Task 10: "Capturing Discriminative Attributes"

Speer, Robyn; Lowry-Duda, Joanna

This is the data required to run Luminoso's entry to the SemEval-2018 task on Capturing Discriminative Attributes.

This data includes:

  • A recently-computed version of the ConceptNet Numberbatch word embeddings
  • The output of an implementation of Semantic Matching Energy over ConceptNet
  • A SQLite database containing the lead section of all articles on the English Wikipedia on 2017-12-20
  • The text file that that database is constructed from
  • A SQLite database of words that co-occur in Google Books 2-grams
  • The text file containing total counts of 2-grams in the Google Books data, which that database is constructed from

For more information, see the paper "Luminoso at SemEval-2018 Task 10: Distinguishing Attributes Using Text Corpora and Relational Knowledge", by Robyn Speer and Joanna Lowry-Duda, to appear in the proceedings of the SemEval workshop at NAACL 2018.

Files (8.8 GB)
Name Size
8.8 GB Download
All versions This version
Views 207207
Downloads 6969
Data volume 605.2 GB605.2 GB
Unique views 194194
Unique downloads 5555


Cite as