Dataset Open Access

Webis EditorialSum Corpus 2020

Syed, Shahbaz; El Baff, Roxanne; Al-Khatib, Khalid; Kiesel, Johannes; Stein, Benno; Potthast, Martin


JSON-LD (schema.org) Export

{
  "inLanguage": {
    "alternateName": "eng", 
    "@type": "Language", 
    "name": "English"
  }, 
  "description": "<p>The Webis EditorialSum Corpus consists of 1330 manually curated extractive summaries for 266 news editorials spanning three diverse portals: Al-Jazeera, Guardian and Fox News. Each editorial has 5 summaries, each labeled for overall quality and fine grained properties such as thesis-relevance, persuasiveness, reasonableness, self-containedness.</p>\n\n<p>The files are organized as follows:</p>\n\n<p><br>\n<em>corpus.csv</em> - <strong>Contains all the editorials and their acquired summaries</strong></p>\n\n<p><br>\nNote: (X = [1,5] for five summaries)</p>\n\n<p>- article_id : Article ID in the corpus<br>\n- title : Title of the editorial<br>\n- article_text : Plain text of the editorial<br>\n- summary_{X}_text : Plain text of the corresponding summary<br>\n- thesis_{X}_text : Plain text of the thesis from the corresponding summary<br>\n- lead : top 15% of the editorial&#39;s segments<br>\n- body : segments between lead and conclusion sections<br>\n- conclusion : bottom 15% of the editorial&#39;s segments<br>\n- article_segments: Collection of paragraphs, each further divided into collection of segments containing:<br>\n&nbsp;{ &quot;number&quot;: segment order in the editorial,<br>\n&nbsp;&nbsp; &quot;text&quot; : segment text,<br>\n&nbsp;&nbsp; &quot;label&quot;: ADU type<br>\n&nbsp;}<br>\n- summary_{X}_segments: Collection of summary segments containing:<br>\n{ &quot;number&quot;: segment order in the editorial,<br>\n&nbsp; &quot;text&quot; : segment text,<br>\n&nbsp; &quot;adu_label&quot;: ADU type from the editorial,<br>\n&nbsp; &quot;summary_label&quot;: can be &#39;thesis&#39; or &#39;justification&#39;<br>\n}</p>\n\n<p><br>\n<em>quality-groups.csv</em> - <strong>Contains the IDs for high(and low)-quality summaries for each quality dimension per editorial</strong><br>\n<br>\nFor example: article_id 2 has four high_quality summaries (summary_1, summary_2, summary_3, summary_4) and one low_quality summary (summary_5) in terms of overall quality.<br>\nThe summary texts can be obtained from corpus.csv respectively.</p>\n\n<p>&nbsp;</p>\n\n<p>&nbsp;</p>\n\n<p>&nbsp;</p>", 
  "license": "https://creativecommons.org/licenses/by/4.0/legalcode", 
  "creator": [
    {
      "affiliation": "Leipzig University", 
      "@type": "Person", 
      "name": "Syed, Shahbaz"
    }, 
    {
      "affiliation": "German Aerospace Centre (DLR)", 
      "@type": "Person", 
      "name": "El Baff, Roxanne"
    }, 
    {
      "affiliation": "Bauhaus Universit\u00e4t, Weimar", 
      "@type": "Person", 
      "name": "Al-Khatib, Khalid"
    }, 
    {
      "affiliation": "Bauhaus Universit\u00e4t, Weimar", 
      "@type": "Person", 
      "name": "Kiesel, Johannes"
    }, 
    {
      "affiliation": "Bauhaus Universit\u00e4t, Weimar", 
      "@type": "Person", 
      "name": "Stein, Benno"
    }, 
    {
      "affiliation": "Leipzig University", 
      "@type": "Person", 
      "name": "Potthast, Martin"
    }
  ], 
  "url": "https://zenodo.org/record/4105765", 
  "datePublished": "2020-10-19", 
  "keywords": [
    "editorial summarization", 
    "argumentation  summarization", 
    "extractive summarization"
  ], 
  "@context": "https://schema.org/", 
  "distribution": [
    {
      "contentUrl": "https://zenodo.org/api/files/f8875f06-5ef5-4290-bb16-a432936b1516/corpus.csv", 
      "encodingFormat": "csv", 
      "@type": "DataDownload"
    }, 
    {
      "contentUrl": "https://zenodo.org/api/files/f8875f06-5ef5-4290-bb16-a432936b1516/quality-groups.csv", 
      "encodingFormat": "csv", 
      "@type": "DataDownload"
    }
  ], 
  "identifier": "https://doi.org/10.5281/zenodo.4105765", 
  "@id": "https://doi.org/10.5281/zenodo.4105765", 
  "@type": "Dataset", 
  "name": "Webis EditorialSum Corpus 2020"
}
54
62
views
downloads
All versions This version
Views 5454
Downloads 6262
Data volume 537.8 MB537.8 MB
Unique views 4747
Unique downloads 3737

Share

Cite as