QReCC - Question Rewriting in Conversational Context
====================================================
2021-10-01

QReCC contains 14K conversations with 81K question-answer pairs.

See the Github Repository [1] for more information.

In the version published here, qrecc-test.json has an additional field, "Passages", that contains the IDs of the passages that (1) have been retrieved by one of the methods from the paper [2] and (2) have a token overlap F1 above or equal to 0.8 with the human answer for the question.

The passages.zip contains the passages created as described in the Github Repository [3] (i.e., after applying the paragraph_chunker.py).


SCAI-QReCC-21
-------------
The dataset is used in the SCAI QReCC'21 shared task [4]. This collection contains the following files:

- Training dataset
  - scai-qrecc21-training-turns.json
    Adaptation of the qrecc-training.json. The fields "Rewrite", "Passages", and "Answer" are renamed to "Truth_rewrite", "Truth_passages", and "Truth_answer" respectively. A field "Transformer_rewrite" is added that contains the question rewrites by the "Transformer++" approach from the paper [2]. Truth passages for answers with less than six tokens are removed. The "Turn_no" are enforced to be sequential (a few numbers are skipped in the original files).
  - scai-qrecc21-training-questions.json
    Generated from scai-qrecc21-training-turns.json [5]. Contains the "Conversation_no", "Turn_no", and "Question" for each turn.
  - scai-qrecc21-training-questions-rewritten.json
    Generated from scai-qrecc21-training-turns.json [5]. Contains the "Conversation_no", "Turn_no", and "Question" for each turn, but the "Question" is the "Truth_rewrite" if it existed (the original "Question" otherwise).
  - scai-qrecc21-training-ground-truth.json
    Generated from scai-qrecc21-training-turns.json [5]. Contains the "Truth_rewrite", "Truth_passages", and "Truth_answer" fields for each turn.

- Toy dataset (for testing that approaches work in general)
  - scai-qrecc21-toy-questions.json
    Contains the data of the first conversation of scai-qrecc21-training-questions.json.
  - scai-qrecc21-toy-questions-rewritten.json
    Contains the data of the first conversation of scai-qrecc21-training-questions-rewritten.json.
  - scai-qrecc21-toy-ground-truth.json
    Contains the data of the first conversation of scai-qrecc21-training-ground-truth.json.

- Test dataset
  - scai-qrecc21-test-turns.json
    The same as scai-qrecc21-training-turns.json, but for the test set.
  - scai-qrecc21-test-questions.json
    The same as scai-qrecc21-training-questions.json, but for the test set.
  - scai-qrecc21-test-questions-rewritten.json
    The same as scai-qrecc21-training-questions-rewritten.json, but for the test set.
  - scai-qrecc21-test-ground-truth.json
    The same as scai-qrecc21-training-ground-truth.json, but for the test set.

- Baseline submission
  - scai-qrecc21-naacl-baseline.json
    Run file for the end-to-end approach from the paper [2] on the test set, used as a baseline in the task.

References
----------
[1] https://github.com/apple/ml-qrecc
[2] https://arxiv.org/abs/2010.04898
[3] https://github.com/apple/ml-qrecc/tree/main/collection
[4] https://scai.info/scai-qrecc/
[5] https://github.com/scai-conf/SCAI-QReCC-21/blob/main/code/util/turns_split_data_and_truth.sh