Published September 15, 2022 | Version 0.0.0
Dataset Open

IDN-Sum

  • 1. University of Southampton

Description

Summarizing Interactive Digital Narratives (IDN) presents some unique challenges to existing text summarization models especially around capturing interactive elements in addition to important plot points. In this paper we describe the first IDN dataset (IDN-Sum) designed specifically for training and testing IDN text summarization algorithms. Our dataset is generated using random playthroughs of 8 IDN episodes, taken from 2 different IDN games, and consists of 10,000 documents. Playthrough documents are annotated through automatic alignment with fan-sourced summaries using a commonly used alignment algorithm. The dataset is released as open source for future researchers to train and test their own approaches for IDN text.

Annotated Data folder contains the IDN-Sum data that was automatically annotated using the alignment algorithm. Subfolders hold different versions of data in the format suitable for input to BertSum (bs) from TransformerSum library (https://github.com/HHousen/TransformerSum) and SummaRuNNer(sr) (for implementation at https://github.com/hpzhao/SummaRuNNer). They are are named using convention [model_name]_[summary_length].

Unannotated playthroughs can be found in Cleaned Data folder.

Files

dataset.zip

Files (3.8 GB)

Name Size Download all
md5:5eebde01c64f43475952f1d171c067f5
3.8 GB Preview Download