Luminoso Input Data for SemEval-2018 Task 10: "Capturing Discriminative Attributes"
This is the data required to run Luminoso's entry to the SemEval-2018 task on Capturing Discriminative Attributes.
This data includes:
- A recently-computed version of the ConceptNet Numberbatch word embeddings
- The output of an implementation of Semantic Matching Energy over ConceptNet
- A SQLite database containing the lead section of all articles on the English Wikipedia on 2017-12-20
- The text file that that database is constructed from
- A SQLite database of words that co-occur in Google Books 2-grams
- The text file containing total counts of 2-grams in the Google Books data, which that database is constructed from
For more information, see the paper "Luminoso at SemEval-2018 Task 10: Distinguishing Attributes Using Text Corpora and Relational Knowledge", by Robyn Speer and Joanna Lowry-Duda, to appear in the proceedings of the SemEval workshop at NAACL 2018.