Published October 22, 2023 | Version 1.0.0
Dataset Open

Webis-Context-SciSumm-2023

  • 1. Leipzig University
  • 2. ROR icon University of Groningen

Description

The Webis-Context-SciSumm-2023 is a large scale dataset suitable for studying contextualized summarization of scientific papers. The corpus contains approximately 540K computer science papers encompassing 4.6M citation texts and relevant information for these citations from the cited papers. The subset (approximately 25K papers) provided contains abstractive summaries of the relevant content from LLaMA (V1) and Vicuna (13B) models. 

The summaries for the completed dataset will be updated on completion (due to computational constraints).

Files

context-scisumm-subset.zip

Files (10.1 GB)

Name Size Download all
md5:aecb4f141dd50698d587ad70dea3c873
240.6 MB Preview Download
md5:54a52ca41b56fcc91914f6498849f92d
9.9 GB Preview Download