There is a newer version of the record available.

Published August 10, 2025 | Version v1
Dataset Open

QAWiki v1: Knowledge Graph Question Answering (KGQA) / SPARQL Query Generation Dataset for Wikidata

  • 1. DCC, Universidad de Chile
  • 2. Instituto Milenio Fundamentos de los Datos

Description

This is a snapshot of QAWiki from 2025-09-09: a dataset for knowledge graph question answering (KGQA) and/or SPARQL query generation over Wikidata.

The dataset is presented in two formats:

  • The simple format is a TSV file, and contains language-tagged questions and paraphrased questions with SPARQL queries.
  • The full format is a TTL file, and contains a full RDF dump of QAWiki featuring also entity mentions, relation mentions, question relations, quality tags, etc.

The dataset contains 518 question/query pairs in English and Spanish with SPARQL queries (and 8 additional ambiguous questions without queries). Some questions also feature Italian and Danish translations provided by the community.

Files

Files (2.9 MB)

Name Size Download all
md5:9f2d8ecae766576b3e3c23c9565debff
2.6 MB Download
md5:f2be2d6c795665e2523e7df82575c532
226.2 kB Download