Knowledge bases for explainable benchmarking (QALD10, QALD9+DB, QALD9+WK)
Authors/Creators
Description
This project provides three knowledge graphs that we created for the three QA benchmarks: QALD-9 plus DBpedia, QALD-9 plus Wikidata, and QALD-10. Here are some more details:
1. Preprocessing
-
We remove all questions from the three QA datasets that have an empty ground truth answer set.
-
We preprocessed the DBpedia reference graph by:
-
Removing 43,618 triples with IRIs that do not pass through the RDF checker.
-
Removing properties of the
http://dbpedia.org/property/namespace. -
Inferring the classes of all entities based on the class hierarchy.
-
-
We preprocessed Wikidata by replacing the property
http://www.wikidata.org/prop/direct/P31withhttp://www.w3.org/1999/02/22-rdf\textbackslash-syntax-ns\#type.
2. Knowledge Base Structure
In the first step of our benchmarking framework, we generate a knowledge graph comprising information from the dataset used during the benchmarking process. Our work relies on the QALD datasets, which include three types of data for each question:
-
Natural language question
Each question comes with a representation in several languages. From the English question, we extract linguistic features such as:-
The length of the question (
dqb:hasLength) Note: The prefixdqb:refers to the namespacehttp://w3id.org/dice-research/qa-bench#. -
The presence of negation (
dqb:hasNegation) -
The question word (
dqb:hasQuestionWord) -
The NLP parse tree (
dqb:hasNlpParseTreeRoot)
Note: We employ the Stanford NLP toolkit for the extraction.
-
-
Answer(s)
Each question comes with the ground truth answers. We add these answers to the generated graph with three different properties distinguishing:-
IRI answers (
dqb:hasIRIAnswer) -
Boolean answers (
dqb:hasBooleanAnswer) -
Other literal answers (
dqb:hasLiteralAnswer)
For each IRI listed as an answer, we add its concise bounded description (CBD) extracted from the reference knowledge graph.
-
-
SPARQL query
Each question has a SPARQL query that returns the ground truth answer when used on the reference knowledge graph. We adopt LSQ to add the following SPARQL query features to our knowledge graph:-
Entities (
dqb:hasEntity), properties (dqb:hasProperty) contained in the query, and the CBD of the entities -
Type of query
-
The number of triple patterns
-
The number of basic graph patterns
-
The average degree of vertices
-
The median degree of vertices involved in join operations
-
The minimum, maximum, and median number of triple patterns in a basic graph pattern
-
The presence of certain keywords such as
FILTER,DISTINCT, andGROUP BY
-
Files
QALD_KG.zip
Files
(2.4 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:5bc7d33a19e77f5ac502dec06c93bc3d
|
2.4 GB | Preview Download |