Dataset Open Access

Webis EditorialSum Corpus 2020

Syed, Shahbaz; El Baff, Roxanne; Al-Khatib, Khalid; Kiesel, Johannes; Stein, Benno; Potthast, Martin


Citation Style Language JSON Export

{
  "publisher": "Zenodo", 
  "DOI": "10.5281/zenodo.4105765", 
  "language": "eng", 
  "title": "Webis EditorialSum Corpus 2020", 
  "issued": {
    "date-parts": [
      [
        2020, 
        10, 
        19
      ]
    ]
  }, 
  "abstract": "<p>The Webis EditorialSum Corpus consists of 1330 manually curated extractive summaries for 266 news editorials spanning three diverse portals: Al-Jazeera, Guardian and Fox News. Each editorial has 5 summaries, each labeled for overall quality and fine grained properties such as thesis-relevance, persuasiveness, reasonableness, self-containedness.</p>\n\n<p>The files are organized as follows:</p>\n\n<p><br>\n<em>corpus.csv</em> - <strong>Contains all the editorials and their acquired summaries</strong></p>\n\n<p><br>\nNote: (X = [1,5] for five summaries)</p>\n\n<p>- article_id : Article ID in the corpus<br>\n- title : Title of the editorial<br>\n- article_text : Plain text of the editorial<br>\n- summary_{X}_text : Plain text of the corresponding summary<br>\n- thesis_{X}_text : Plain text of the thesis from the corresponding summary<br>\n- lead : top 15% of the editorial&#39;s segments<br>\n- body : segments between lead and conclusion sections<br>\n- conclusion : bottom 15% of the editorial&#39;s segments<br>\n- article_segments: Collection of paragraphs, each further divided into collection of segments containing:<br>\n&nbsp;{ &quot;number&quot;: segment order in the editorial,<br>\n&nbsp;&nbsp; &quot;text&quot; : segment text,<br>\n&nbsp;&nbsp; &quot;label&quot;: ADU type<br>\n&nbsp;}<br>\n- summary_{X}_segments: Collection of summary segments containing:<br>\n{ &quot;number&quot;: segment order in the editorial,<br>\n&nbsp; &quot;text&quot; : segment text,<br>\n&nbsp; &quot;adu_label&quot;: ADU type from the editorial,<br>\n&nbsp; &quot;summary_label&quot;: can be &#39;thesis&#39; or &#39;justification&#39;<br>\n}</p>\n\n<p><br>\n<em>quality-groups.csv</em> - <strong>Contains the IDs for high(and low)-quality summaries for each quality dimension per editorial</strong><br>\n<br>\nFor example: article_id 2 has four high_quality summaries (summary_1, summary_2, summary_3, summary_4) and one low_quality summary (summary_5) in terms of overall quality.<br>\nThe summary texts can be obtained from corpus.csv respectively.</p>\n\n<p>&nbsp;</p>\n\n<p>&nbsp;</p>\n\n<p>&nbsp;</p>", 
  "author": [
    {
      "family": "Syed, Shahbaz"
    }, 
    {
      "family": "El Baff, Roxanne"
    }, 
    {
      "family": "Al-Khatib, Khalid"
    }, 
    {
      "family": "Kiesel, Johannes"
    }, 
    {
      "family": "Stein, Benno"
    }, 
    {
      "family": "Potthast, Martin"
    }
  ], 
  "type": "dataset", 
  "id": "4105765"
}
56
63
views
downloads
All versions This version
Views 5656
Downloads 6363
Data volume 548.5 MB548.5 MB
Unique views 4848
Unique downloads 3838

Share

Cite as