Complex Sequential Question Answering dataset

Saha, Amrita; Pahuja, Vardaan; Khapra, Mitesh; Sankaranarayanan, Karthik; Chandar, Sarath

doi:10.5281/zenodo.3268649

Published March 9, 2018 | Version v1

Dataset Open

Complex Sequential Question Answering dataset

While conversing with chatbots, humans typically tend to ask many questions, a significant portion of which can be answered by referring to large-scale knowledge graphs (KG). While Question Answering (QA) and dialog systems have been studied independently, there is a need to study them closely to evaluate such real-world scenarios faced by bots involving both these tasks. Towards this end, we introduce the task of Complex Sequential QA which combines the two tasks of (i) answering factual questions through complex inferencing over a realistic-sized KG of millions of entities, and (ii) learning to converse through a series of coherently linked QA pairs. Through a labor intensive semi-automatic process, involving in-house and crowdsourced workers, we created a dataset containing around 200K dialogs with a total of 1.6M turns. Further, unlike existing large scale QA datasets which contain simple questions that can be answered from a single tuple, the questions in our dialogs require a larger subgraph of the KG. Specifically, our dataset has questions which require logical, quantitative, and comparative reasoning as well as their combinations. This calls for models which can: (i) parse complex natural language questions, (ii) use conversation context to resolve coreferences and ellipsis in utterances, (iii) ask for clarifications for ambiguous queries, and finally (iv) retrieve relevant subgraphs of the KG to answer such questions. However, our experiments with a combination of state of the art dialog and QA models show that they clearly do not achieve the above objectives and are inadequate for dealing with such complex real world settings. We believe that this new dataset coupled with the limitations of existing models as reported in this paper should encourage further research in Complex Sequential QA.

Please visit https://amritasaha1812.github.io/CSQA/ for more details.

Files

CSQA_v9.zip

Files (628.7 MB)

Name	Size
CSQA_v9.zip md5:0c698b2f642ada7bc34dd36c43cc6b74	628.7 MB	Preview Download

	All versions	This version
Views	2,804	2,792
Downloads	1,941	1,934
Data volume	2.9 TB	2.9 TB

Complex Sequential Question Answering dataset

Authors/Creators

Description

Files

CSQA_v9.zip

Files (628.7 MB)