Published October 22, 2023 | Version 1.0.0
Dataset Open


  • 1. Leipzig University
  • 2. ROR icon University of Groningen


The Webis-Context-SciSumm-2023 is a large scale dataset suitable for studying contextualized summarization of scientific papers. The corpus contains approximately 540K computer science papers encompassing 4.6M citation texts and relevant information for these citations from the cited papers. The subset (approximately 25K papers) provided contains abstractive summaries of the relevant content from LLaMA (V1) and Vicuna (13B) models. 

The summaries for the completed dataset will be updated on completion (due to computational constraints).


Files (10.1 GB)

Name Size Download all
240.6 MB Preview Download
9.9 GB Preview Download