SCAI-QReCC-21
-------------
This dataset was created for the SCAI QReCC'21 shared task [1]. It is derived from the original QReCC dataset [2,3], split, reformatted and annotated. The collection contains the following files:

- Task datasets

  - turns.zip
    Dataset splits with turn numbers fixed:
    - scai-qrecc21-training-turns.json
      Adaptation of the qrecc-training.json. The fields "Rewrite", "Passages", and "Answer" are renamed to "Truth_rewrite", "Truth_passages", and "Truth_answer" respectively. A field "Transformer_rewrite" is added that contains the question rewrites by the "Transformer++" approach from the paper [5]. Truth passages for answers with less than six tokens are removed. The "Turn_no" are enforced to be sequential (a few numbers are skipped in the original files).
    - scai-qrecc21-test-turns.json
      The same as scai-qrecc21-training-turns.json, but for the test set.

  - questions.zip
    Dataset splits with (original) conversational questions and original numbers that may be useful for merging with the original datasets (TREC CAsT, QuAC, NQ):
    - scai-qrecc21-training-questions.json
      Generated from scai-qrecc21-training-turns.json [4]. Contains the "Conversation_no", "Turn_no", and "Question" for each turn.
    - scai-qrecc21-toy-questions.json
      Contains the data of the first conversation of scai-qrecc21-training-questions.json.
    - scai-qrecc21-test-questions.json
      The same as scai-qrecc21-training-questions.json, but for the test set.

  - questions-rewritten.zip
    Dataset splits with rewritten (decontextualised, unambigous) questions derived from the original conversational questions and manually handcrafted by professional expert annotators:
    - scai-qrecc21-training-questions-rewritten.json
      Generated from scai-qrecc21-training-turns.json [4]. Contains the "Conversation_no", "Turn_no", and "Question" for each turn, but the "Question" is the "Truth_rewrite" if it existed (the original "Question" otherwise).
    - scai-qrecc21-toy-questions-rewritten.json
      Contains the data of the first conversation of scai-qrecc21-training-questions-rewritten.json.
    - scai-qrecc21-test-questions-rewritten.json
      The same as scai-qrecc21-training-questions-rewritten.json, but for the test set.

  - ground-truth.zip
    Dataset splits with manually handcrafted answers by professional expert annotators and question rewrites:
    - scai-qrecc21-training-ground-truth.json
      Generated from scai-qrecc21-training-turns.json [4]. Contains the "Truth_rewrite", "Truth_passages", and "Truth_answer" fields for each turn.
    - scai-qrecc21-toy-ground-truth.json
      Contains the data of the first conversation of scai-qrecc21-training-ground-truth.json.
    - scai-qrecc21-test-ground-truth.json
      The same as scai-qrecc21-training-ground-truth.json, but for the test set.


- Task results

  - scai-qrecc-21-valid-runs.zip
    Runs submitted by the task participants (4 teams) for conversational and rewritten questions separately

  - answer_plausibility_annotations.csv
    A sample of answers from the submitted runs that contains original crowdosurced annotations collected using Amazon Mechanical Turk (anonymised). The sampled answers were annotated as plausible, implausible and malformed with respect to the rewritten question (without conversational context).

  - answer_plausibility_annotations_clean_with_disagreements.csv
    Derived from answer_plausibility_annotations.csv with quality assurance comments processed, the duplicate rows and redundant columns dropped.

  - answer_plausibility_annotations_clean_without_disagreements.csv
    Derived from answer_plausibility_annotations_clean_with_disagreements.csv with all rows with different Annotations dropped automatically, i.e., disagreement resolved by deletion.



References
----------
[1] https://scai.info/scai-qrecc
[2] https://github.com/apple/ml-qrecc
[3] https://zenodo.org/record/5115890#.Ya3vNi8RppS
[4] https://github.com/scai-conf/SCAI-QReCC-21/blob/main/code/util/turns_split_data_and_truth.sh
[5] https://arxiv.org/abs/2010.04898