Webis EditorialSum Corpus 2020

doi:10.5281/zenodo.4105765

Published October 19, 2020 | Version v1

Dataset Open

Webis EditorialSum Corpus 2020

1. Leipzig University
2. German Aerospace Centre (DLR)
3. Bauhaus Universität, Weimar

The Webis EditorialSum Corpus consists of 1330 manually curated extractive summaries for 266 news editorials spanning three diverse portals: Al-Jazeera, Guardian and Fox News. Each editorial has 5 summaries, each labeled for overall quality and fine grained properties such as thesis-relevance, persuasiveness, reasonableness, self-containedness.

The files are organized as follows:

corpus.csv - Contains all the editorials and their acquired summaries

Note: (X = [1,5] for five summaries)

- article_id : Article ID in the corpus
- title : Title of the editorial
- article_text : Plain text of the editorial
- summary_{X}_text : Plain text of the corresponding summary
- thesis_{X}_text : Plain text of the thesis from the corresponding summary
- lead : top 15% of the editorial's segments
- body : segments between lead and conclusion sections
- conclusion : bottom 15% of the editorial's segments
- article_segments: Collection of paragraphs, each further divided into collection of segments containing:
{ "number": segment order in the editorial,
"text" : segment text,
"label": ADU type
}
- summary_{X}_segments: Collection of summary segments containing:
{ "number": segment order in the editorial,
"text" : segment text,
"adu_label": ADU type from the editorial,
"summary_label": can be 'thesis' or 'justification'
}

quality-groups.csv - Contains the IDs for high(and low)-quality summaries for each quality dimension per editorial

For example: article_id 2 has four high_quality summaries (summary_1, summary_2, summary_3, summary_4) and one low_quality summary (summary_5) in terms of overall quality.
The summary texts can be obtained from corpus.csv respectively.

Files

corpus.csv

Files (10.8 MB)

Name	Size	Download all
corpus.csv md5:b3053455c6c58580570c9e30390f7d62	10.7 MB	Preview Download
quality-groups.csv md5:117cb5a5712a3772b0da9ab254a331d7	95.0 kB	Preview Download

	All versions	This version
Views	280	278
Downloads	100	99
Data volume	1.3 GB	1.3 GB

Webis EditorialSum Corpus 2020

Creators

Description

Files

corpus.csv

Files (10.8 MB)