Dataset Open Access

Webis EditorialSum Corpus 2020

Syed, Shahbaz; El Baff, Roxanne; Al-Khatib, Khalid; Kiesel, Johannes; Stein, Benno; Potthast, Martin


DataCite XML Export

<?xml version='1.0' encoding='utf-8'?>
<resource xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://datacite.org/schema/kernel-4" xsi:schemaLocation="http://datacite.org/schema/kernel-4 http://schema.datacite.org/meta/kernel-4.1/metadata.xsd">
  <identifier identifierType="DOI">10.5281/zenodo.4105765</identifier>
  <creators>
    <creator>
      <creatorName>Syed, Shahbaz</creatorName>
      <givenName>Shahbaz</givenName>
      <familyName>Syed</familyName>
      <affiliation>Leipzig University</affiliation>
    </creator>
    <creator>
      <creatorName>El Baff, Roxanne</creatorName>
      <givenName>Roxanne</givenName>
      <familyName>El Baff</familyName>
      <affiliation>German Aerospace Centre (DLR)</affiliation>
    </creator>
    <creator>
      <creatorName>Al-Khatib, Khalid</creatorName>
      <givenName>Khalid</givenName>
      <familyName>Al-Khatib</familyName>
      <affiliation>Bauhaus Universität, Weimar</affiliation>
    </creator>
    <creator>
      <creatorName>Kiesel, Johannes</creatorName>
      <givenName>Johannes</givenName>
      <familyName>Kiesel</familyName>
      <affiliation>Bauhaus Universität, Weimar</affiliation>
    </creator>
    <creator>
      <creatorName>Stein, Benno</creatorName>
      <givenName>Benno</givenName>
      <familyName>Stein</familyName>
      <affiliation>Bauhaus Universität, Weimar</affiliation>
    </creator>
    <creator>
      <creatorName>Potthast, Martin</creatorName>
      <givenName>Martin</givenName>
      <familyName>Potthast</familyName>
      <affiliation>Leipzig University</affiliation>
    </creator>
  </creators>
  <titles>
    <title>Webis EditorialSum Corpus 2020</title>
  </titles>
  <publisher>Zenodo</publisher>
  <publicationYear>2020</publicationYear>
  <subjects>
    <subject>editorial summarization</subject>
    <subject>argumentation  summarization</subject>
    <subject>extractive summarization</subject>
  </subjects>
  <dates>
    <date dateType="Issued">2020-10-19</date>
  </dates>
  <language>en</language>
  <resourceType resourceTypeGeneral="Dataset"/>
  <alternateIdentifiers>
    <alternateIdentifier alternateIdentifierType="url">https://zenodo.org/record/4105765</alternateIdentifier>
  </alternateIdentifiers>
  <relatedIdentifiers>
    <relatedIdentifier relatedIdentifierType="DOI" relationType="IsVersionOf">10.5281/zenodo.4105764</relatedIdentifier>
    <relatedIdentifier relatedIdentifierType="URL" relationType="IsPartOf">https://zenodo.org/communities/webis</relatedIdentifier>
  </relatedIdentifiers>
  <rightsList>
    <rights rightsURI="https://creativecommons.org/licenses/by/4.0/legalcode">Creative Commons Attribution 4.0 International</rights>
    <rights rightsURI="info:eu-repo/semantics/openAccess">Open Access</rights>
  </rightsList>
  <descriptions>
    <description descriptionType="Abstract">&lt;p&gt;The Webis EditorialSum Corpus consists of 1330 manually curated extractive summaries for 266 news editorials spanning three diverse portals: Al-Jazeera, Guardian and Fox News. Each editorial has 5 summaries, each labeled for overall quality and fine grained properties such as thesis-relevance, persuasiveness, reasonableness, self-containedness.&lt;/p&gt;

&lt;p&gt;The files are organized as follows:&lt;/p&gt;

&lt;p&gt;&lt;br&gt;
&lt;em&gt;corpus.csv&lt;/em&gt; - &lt;strong&gt;Contains all the editorials and their acquired summaries&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;br&gt;
Note: (X = [1,5] for five summaries)&lt;/p&gt;

&lt;p&gt;- article_id : Article ID in the corpus&lt;br&gt;
- title : Title of the editorial&lt;br&gt;
- article_text : Plain text of the editorial&lt;br&gt;
- summary_{X}_text : Plain text of the corresponding summary&lt;br&gt;
- thesis_{X}_text : Plain text of the thesis from the corresponding summary&lt;br&gt;
- lead : top 15% of the editorial&amp;#39;s segments&lt;br&gt;
- body : segments between lead and conclusion sections&lt;br&gt;
- conclusion : bottom 15% of the editorial&amp;#39;s segments&lt;br&gt;
- article_segments: Collection of paragraphs, each further divided into collection of segments containing:&lt;br&gt;
&amp;nbsp;{ &amp;quot;number&amp;quot;: segment order in the editorial,&lt;br&gt;
&amp;nbsp;&amp;nbsp; &amp;quot;text&amp;quot; : segment text,&lt;br&gt;
&amp;nbsp;&amp;nbsp; &amp;quot;label&amp;quot;: ADU type&lt;br&gt;
&amp;nbsp;}&lt;br&gt;
- summary_{X}_segments: Collection of summary segments containing:&lt;br&gt;
{ &amp;quot;number&amp;quot;: segment order in the editorial,&lt;br&gt;
&amp;nbsp; &amp;quot;text&amp;quot; : segment text,&lt;br&gt;
&amp;nbsp; &amp;quot;adu_label&amp;quot;: ADU type from the editorial,&lt;br&gt;
&amp;nbsp; &amp;quot;summary_label&amp;quot;: can be &amp;#39;thesis&amp;#39; or &amp;#39;justification&amp;#39;&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;&lt;br&gt;
&lt;em&gt;quality-groups.csv&lt;/em&gt; - &lt;strong&gt;Contains the IDs for high(and low)-quality summaries for each quality dimension per editorial&lt;/strong&gt;&lt;br&gt;
&lt;br&gt;
For example: article_id 2 has four high_quality summaries (summary_1, summary_2, summary_3, summary_4) and one low_quality summary (summary_5) in terms of overall quality.&lt;br&gt;
The summary texts can be obtained from corpus.csv respectively.&lt;/p&gt;

&lt;p&gt;&amp;nbsp;&lt;/p&gt;

&lt;p&gt;&amp;nbsp;&lt;/p&gt;

&lt;p&gt;&amp;nbsp;&lt;/p&gt;</description>
  </descriptions>
</resource>
56
63
views
downloads
All versions This version
Views 5656
Downloads 6363
Data volume 548.5 MB548.5 MB
Unique views 4848
Unique downloads 3838

Share

Cite as