There is a newer version of the record available.

Published August 25, 2020 | Version 2
Dataset Open

Cause-Effect-Context from Natural Questions (NQ-CE)

  • 1. RPI
  • 2. IBM Research

Contributors

  • 1. RPI
  • 2. IBM Research

Description

This dataset is derived from the Natural Questions (NQ) dataset which is a large benchmark for open question answering research (https://ai.google.com/research/NaturalQuestions).

This dataset contains a collection of cause-effect pairs along with their context (the text describing the causal relation between the cause and the effect) as well as the original question in the NQ data set. It also contains a collection of "negative" pairs, phrases that are mentioned in the context but have no causal relation.

This dataset is constructed by filtering questions in the NQ dataset that follow a certain pattern indicating that the question is causal. Either the cause or the effect is the (short) answer in the original NQ dataset, and the other side is manually derived from the context.

The data is shared in JSONL format with every line being a processed NQ question with relevant fields described above. In version 2, each JSON object has the following fields:

  • phrase1: the first phrase (text span)
  • phrase2: the second phrase (text span)
  • label: "causal" means phrase 1 causes phrase 2, "non_causal" means "phrase1" and "phrase2" do NOT have a causal relation between them
  • passage: the context that states that phrase1 causes phrase2 (for causal) or just the passage that has both phrase1 and phrase2 (for non_causal).
  • document_url: the Wikipedia URL from the Natural Questions data
  • question_text: the original question text from the Natural Questions data

License: https://creativecommons.org/licenses/by-sa/3.0/

Contacts:
Gaurav Dass: dassg2 AT rpi.edu
Oktie Hassanazadeh

Files

Files (238.4 kB)

Name Size Download all
md5:6ae65787657ddc897bcd2f6d6ef60285
238.4 kB Download