There is a newer version of the record available.

Published March 19, 2026 | Version v2026.1
Dataset Open

QAWiki v2026.1: Knowledge Graph Question Answering (KGQA) / SPARQL Query Generation Dataset for Wikidata

  • 1. Instituto Milenio Fundamentos de los Datos
  • 2. DCC, Universidad de Chile
  • 3. ROR icon Bielefeld University
  • 1. ROR icon Leuphana University of Lüneburg

Description

This is a snapshot of QAWiki from 2026-03-19: a dataset for knowledge graph question answering (KGQA) and/or SPARQL query generation over Wikidata. This snapshot is published for use in the WikiKGQA 2026 challenge.

For the WikiKGQA challenge, please rather use the updated version: v2026.2.

The dataset contains question/query pairs in English and Spanish with SPARQL queries. Some questions also feature Italian and Danish translations provided by the community.

The dataset is presented in two formats:

  • The full format is a TTL file, and contains a full RDF dump of QAWiki featuring also entity mentions, relation mentions, question relations, quality tags, etc.
  • The WikiKGQA format is a JSON file, following an extended form of the QALD schema, extracted from the full format: it provides questions, queries, mentions and expected solutions, but omits some details such as questions with queries generating empty results on the static Wikidata dump used for the challenge, alias queries, questions not in English nor Spanish, ambiguous questions, question relations, etc.

Files

qawiki-complete.ttl.zip

Files (4.5 MB)

Name Size Download all
md5:9d8c78ee021501d1448123a6c7d00383
425.5 kB Preview Download
md5:dbd6cf5a64bf7b2b916874859f7e4e85
4.1 MB Preview Download