QReCC - Question Rewriting in Conversational Context ==================================================== QReCC contains 14K conversations with 81K question-answer pairs. See the Github Repository [1] for more information. In the version published here, qrecc-test.json has an additional field, "Passages", that contains the IDs of the passages that (1) have been retrieved by one of the methods from the paper [2] and (2) have a token overlap F1 above or equal to 0.8 with the human answer for the question. The passages.zip contains the passages created as described in the Github Repository [3] (i.e., after applying the paragraph_chunker.py). SCAI-QReCC-21 ------------- The test dataset is used in the SCAI QReCC'21 shared task [4]. This collection contains the following files: - scai-qrecc21-turns.json Adaptation of the qrecc-test.json. The fields "Rewrite", "Passages", and "Answer" are renamed to "Truth_rewrite", "Truth_passages", and "Truth_answer" respectively. A field "Transformer_rewrite" is added that contains the question rewrites by the "Transformer++" approach from the paper [2]. - scai-qrecc21-questions.json Generated from scai-qrecc21-turns.json [5]. Contains the "Conversation_no", "Turn_no", "Context", and "Question" for each turn. - scai-qrecc21-questions-with-rewrites.json Generated from scai-qrecc21-turns.json [5]. Contains the "Conversation_no", "Turn_no", "Context", and "Question" for each turn and for each decontextualized turn, that is, with empty context and the question being rewritten (1) not, (2) with the "Transformer++" approach from the paper [2], and (3) by a human (i.e., content of the "Truth_answer" field). New "Conversation_no" values are created for these additional turns. - scai-qrecc21-ground-truth.json Generated from scai-qrecc21-turns.json [5]. Contains the "Truth_rewrite", "Truth_passages", and "Truth_answer" fields for each turn. The "Conversation_no" and "Turn_no" are provided for the different rewrites (see scai-qrecc21-questions-with-rewrites.json) in `Turns.*rewrite-type*`, where the rewrite type is one of "model" (not decontextualized), "original" (no rewrite), "transformer" (rewritten with the "Transformer++" approach from the paper [2]), and "human" (the rewrite is equal to the "Truth_answer" field) - scai-qrecc21-naacl-baseline.json Run file for the end-to-end approach from the paper [2], used as a baseline in the task. References ---------- [1] https://github.com/apple/ml-qrecc [2] https://arxiv.org/abs/2010.04898 [3] https://github.com/apple/ml-qrecc/tree/main/collection [4] https://scai.info/scai-qrecc/ [5] https://github.com/scai-conf/SCAI-QReCC-21/blob/main/code/util/turns_split_data_and_truth.sh