QReCC - Question Rewriting in Conversational Context ==================================================== 2021-10-01 QReCC contains 14K conversations with 81K question-answer pairs. See the Github Repository [1] for more information. In the version published here, qrecc-test.json has an additional field, "Passages", that contains the IDs of the passages that (1) have been retrieved by one of the methods from the paper [2] and (2) have a token overlap F1 above or equal to 0.8 with the human answer for the question. The passages.zip contains the passages created as described in the Github Repository [3] (i.e., after applying the paragraph_chunker.py). SCAI-QReCC-21 ------------- The dataset is used in the SCAI QReCC'21 shared task [4]. This collection contains the following files: - Training dataset - scai-qrecc21-training-turns.json Adaptation of the qrecc-training.json. The fields "Rewrite", "Passages", and "Answer" are renamed to "Truth_rewrite", "Truth_passages", and "Truth_answer" respectively. A field "Transformer_rewrite" is added that contains the question rewrites by the "Transformer++" approach from the paper [2]. Truth passages for answers with less than six tokens are removed. The "Turn_no" are enforced to be sequential (a few numbers are skipped in the original files). - scai-qrecc21-training-questions.json Generated from scai-qrecc21-training-turns.json [5]. Contains the "Conversation_no", "Turn_no", and "Question" for each turn. - scai-qrecc21-training-questions-rewritten.json Generated from scai-qrecc21-training-turns.json [5]. Contains the "Conversation_no", "Turn_no", and "Question" for each turn, but the "Question" is the "Truth_rewrite" if it existed (the original "Question" otherwise). - scai-qrecc21-training-ground-truth.json Generated from scai-qrecc21-training-turns.json [5]. Contains the "Truth_rewrite", "Truth_passages", and "Truth_answer" fields for each turn. - Toy dataset (for testing that approaches work in general) - scai-qrecc21-toy-questions.json Contains the data of the first conversation of scai-qrecc21-training-questions.json. - scai-qrecc21-toy-questions-rewritten.json Contains the data of the first conversation of scai-qrecc21-training-questions-rewritten.json. - scai-qrecc21-toy-ground-truth.json Contains the data of the first conversation of scai-qrecc21-training-ground-truth.json. - Test dataset - scai-qrecc21-test-turns.json The same as scai-qrecc21-training-turns.json, but for the test set. - scai-qrecc21-test-questions.json The same as scai-qrecc21-training-questions.json, but for the test set. - scai-qrecc21-test-questions-rewritten.json The same as scai-qrecc21-training-questions-rewritten.json, but for the test set. - scai-qrecc21-test-ground-truth.json The same as scai-qrecc21-training-ground-truth.json, but for the test set. - Baseline submission - scai-qrecc21-naacl-baseline.json Run file for the end-to-end approach from the paper [2] on the test set, used as a baseline in the task. References ---------- [1] https://github.com/apple/ml-qrecc [2] https://arxiv.org/abs/2010.04898 [3] https://github.com/apple/ml-qrecc/tree/main/collection [4] https://scai.info/scai-qrecc/ [5] https://github.com/scai-conf/SCAI-QReCC-21/blob/main/code/util/turns_split_data_and_truth.sh