Dataset Open Access
Syed, Shahbaz; El Baff, Roxanne; Al-Khatib, Khalid; Kiesel, Johannes; Stein, Benno; Potthast, Martin
The Webis EditorialSum Corpus consists of 1330 manually curated extractive summaries for 266 news editorials spanning three diverse portals: Al-Jazeera, Guardian and Fox News. Each editorial has 5 summaries, each labeled for overall quality and fine grained properties such as thesis-relevance, persuasiveness, reasonableness, self-containedness.
The files are organized as follows:
corpus.csv - Contains all the editorials and their acquired summaries
Note: (X = [1,5] for five summaries)
- article_id : Article ID in the corpus
- title : Title of the editorial
- article_text : Plain text of the editorial
- summary_{X}_text : Plain text of the corresponding summary
- thesis_{X}_text : Plain text of the thesis from the corresponding summary
- lead : top 15% of the editorial's segments
- body : segments between lead and conclusion sections
- conclusion : bottom 15% of the editorial's segments
- article_segments: Collection of paragraphs, each further divided into collection of segments containing:
{ "number": segment order in the editorial,
"text" : segment text,
"label": ADU type
}
- summary_{X}_segments: Collection of summary segments containing:
{ "number": segment order in the editorial,
"text" : segment text,
"adu_label": ADU type from the editorial,
"summary_label": can be 'thesis' or 'justification'
}
quality-groups.csv - Contains the IDs for high(and low)-quality summaries for each quality dimension per editorial
For example: article_id 2 has four high_quality summaries (summary_1, summary_2, summary_3, summary_4) and one low_quality summary (summary_5) in terms of overall quality.
The summary texts can be obtained from corpus.csv respectively.
Name | Size | |
---|---|---|
corpus.csv
md5:b3053455c6c58580570c9e30390f7d62 |
10.7 MB | Download |
quality-groups.csv
md5:117cb5a5712a3772b0da9ab254a331d7 |
95.0 kB | Download |
All versions | This version | |
---|---|---|
Views | 226 | 226 |
Downloads | 132 | 132 |
Data volume | 1.1 GB | 1.1 GB |
Unique views | 139 | 139 |
Unique downloads | 82 | 82 |