Published October 19, 2020 | Version v1
Dataset Open

Webis EditorialSum Corpus 2020

  • 1. Leipzig University
  • 2. German Aerospace Centre (DLR)
  • 3. Bauhaus Universität, Weimar

Description

The Webis EditorialSum Corpus consists of 1330 manually curated extractive summaries for 266 news editorials spanning three diverse portals: Al-Jazeera, Guardian and Fox News. Each editorial has 5 summaries, each labeled for overall quality and fine grained properties such as thesis-relevance, persuasiveness, reasonableness, self-containedness.

The files are organized as follows:


corpus.csv - Contains all the editorials and their acquired summaries


Note: (X = [1,5] for five summaries)

- article_id : Article ID in the corpus
- title : Title of the editorial
- article_text : Plain text of the editorial
- summary_{X}_text : Plain text of the corresponding summary
- thesis_{X}_text : Plain text of the thesis from the corresponding summary
- lead : top 15% of the editorial's segments
- body : segments between lead and conclusion sections
- conclusion : bottom 15% of the editorial's segments
- article_segments: Collection of paragraphs, each further divided into collection of segments containing:
 { "number": segment order in the editorial,
   "text" : segment text,
   "label": ADU type
 }
- summary_{X}_segments: Collection of summary segments containing:
{ "number": segment order in the editorial,
  "text" : segment text,
  "adu_label": ADU type from the editorial,
  "summary_label": can be 'thesis' or 'justification'
}


quality-groups.csv - Contains the IDs for high(and low)-quality summaries for each quality dimension per editorial

For example: article_id 2 has four high_quality summaries (summary_1, summary_2, summary_3, summary_4) and one low_quality summary (summary_5) in terms of overall quality.
The summary texts can be obtained from corpus.csv respectively.

 

 

 

Files

corpus.csv

Files (10.8 MB)

Name Size Download all
md5:b3053455c6c58580570c9e30390f7d62
10.7 MB Preview Download
md5:117cb5a5712a3772b0da9ab254a331d7
95.0 kB Preview Download