Dataset Open Access

Webis EditorialSum Corpus 2020

Syed, Shahbaz; El Baff, Roxanne; Al-Khatib, Khalid; Kiesel, Johannes; Stein, Benno; Potthast, Martin


MARC21 XML Export

<?xml version='1.0' encoding='UTF-8'?>
<record xmlns="http://www.loc.gov/MARC21/slim">
  <leader>00000nmm##2200000uu#4500</leader>
  <datafield tag="041" ind1=" " ind2=" ">
    <subfield code="a">eng</subfield>
  </datafield>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">editorial summarization</subfield>
  </datafield>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">argumentation  summarization</subfield>
  </datafield>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">extractive summarization</subfield>
  </datafield>
  <controlfield tag="005">20201019122657.0</controlfield>
  <controlfield tag="001">4105765</controlfield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">German Aerospace Centre (DLR)</subfield>
    <subfield code="a">El Baff, Roxanne</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">Bauhaus Universität, Weimar</subfield>
    <subfield code="a">Al-Khatib, Khalid</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">Bauhaus Universität, Weimar</subfield>
    <subfield code="a">Kiesel, Johannes</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">Bauhaus Universität, Weimar</subfield>
    <subfield code="a">Stein, Benno</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">Leipzig University</subfield>
    <subfield code="a">Potthast, Martin</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">10733231</subfield>
    <subfield code="z">md5:b3053455c6c58580570c9e30390f7d62</subfield>
    <subfield code="u">https://zenodo.org/record/4105765/files/corpus.csv</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">94974</subfield>
    <subfield code="z">md5:117cb5a5712a3772b0da9ab254a331d7</subfield>
    <subfield code="u">https://zenodo.org/record/4105765/files/quality-groups.csv</subfield>
  </datafield>
  <datafield tag="542" ind1=" " ind2=" ">
    <subfield code="l">open</subfield>
  </datafield>
  <datafield tag="260" ind1=" " ind2=" ">
    <subfield code="c">2020-10-19</subfield>
  </datafield>
  <datafield tag="909" ind1="C" ind2="O">
    <subfield code="p">openaire_data</subfield>
    <subfield code="p">user-webis</subfield>
    <subfield code="o">oai:zenodo.org:4105765</subfield>
  </datafield>
  <datafield tag="100" ind1=" " ind2=" ">
    <subfield code="u">Leipzig University</subfield>
    <subfield code="a">Syed, Shahbaz</subfield>
  </datafield>
  <datafield tag="245" ind1=" " ind2=" ">
    <subfield code="a">Webis EditorialSum Corpus 2020</subfield>
  </datafield>
  <datafield tag="980" ind1=" " ind2=" ">
    <subfield code="a">user-webis</subfield>
  </datafield>
  <datafield tag="540" ind1=" " ind2=" ">
    <subfield code="u">https://creativecommons.org/licenses/by/4.0/legalcode</subfield>
    <subfield code="a">Creative Commons Attribution 4.0 International</subfield>
  </datafield>
  <datafield tag="650" ind1="1" ind2="7">
    <subfield code="a">cc-by</subfield>
    <subfield code="2">opendefinition.org</subfield>
  </datafield>
  <datafield tag="520" ind1=" " ind2=" ">
    <subfield code="a">&lt;p&gt;The Webis EditorialSum Corpus consists of 1330 manually curated extractive summaries for 266 news editorials spanning three diverse portals: Al-Jazeera, Guardian and Fox News. Each editorial has 5 summaries, each labeled for overall quality and fine grained properties such as thesis-relevance, persuasiveness, reasonableness, self-containedness.&lt;/p&gt;

&lt;p&gt;The files are organized as follows:&lt;/p&gt;

&lt;p&gt;&lt;br&gt;
&lt;em&gt;corpus.csv&lt;/em&gt; - &lt;strong&gt;Contains all the editorials and their acquired summaries&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;br&gt;
Note: (X = [1,5] for five summaries)&lt;/p&gt;

&lt;p&gt;- article_id : Article ID in the corpus&lt;br&gt;
- title : Title of the editorial&lt;br&gt;
- article_text : Plain text of the editorial&lt;br&gt;
- summary_{X}_text : Plain text of the corresponding summary&lt;br&gt;
- thesis_{X}_text : Plain text of the thesis from the corresponding summary&lt;br&gt;
- lead : top 15% of the editorial&amp;#39;s segments&lt;br&gt;
- body : segments between lead and conclusion sections&lt;br&gt;
- conclusion : bottom 15% of the editorial&amp;#39;s segments&lt;br&gt;
- article_segments: Collection of paragraphs, each further divided into collection of segments containing:&lt;br&gt;
&amp;nbsp;{ &amp;quot;number&amp;quot;: segment order in the editorial,&lt;br&gt;
&amp;nbsp;&amp;nbsp; &amp;quot;text&amp;quot; : segment text,&lt;br&gt;
&amp;nbsp;&amp;nbsp; &amp;quot;label&amp;quot;: ADU type&lt;br&gt;
&amp;nbsp;}&lt;br&gt;
- summary_{X}_segments: Collection of summary segments containing:&lt;br&gt;
{ &amp;quot;number&amp;quot;: segment order in the editorial,&lt;br&gt;
&amp;nbsp; &amp;quot;text&amp;quot; : segment text,&lt;br&gt;
&amp;nbsp; &amp;quot;adu_label&amp;quot;: ADU type from the editorial,&lt;br&gt;
&amp;nbsp; &amp;quot;summary_label&amp;quot;: can be &amp;#39;thesis&amp;#39; or &amp;#39;justification&amp;#39;&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;&lt;br&gt;
&lt;em&gt;quality-groups.csv&lt;/em&gt; - &lt;strong&gt;Contains the IDs for high(and low)-quality summaries for each quality dimension per editorial&lt;/strong&gt;&lt;br&gt;
&lt;br&gt;
For example: article_id 2 has four high_quality summaries (summary_1, summary_2, summary_3, summary_4) and one low_quality summary (summary_5) in terms of overall quality.&lt;br&gt;
The summary texts can be obtained from corpus.csv respectively.&lt;/p&gt;

&lt;p&gt;&amp;nbsp;&lt;/p&gt;

&lt;p&gt;&amp;nbsp;&lt;/p&gt;

&lt;p&gt;&amp;nbsp;&lt;/p&gt;</subfield>
  </datafield>
  <datafield tag="773" ind1=" " ind2=" ">
    <subfield code="n">doi</subfield>
    <subfield code="i">isVersionOf</subfield>
    <subfield code="a">10.5281/zenodo.4105764</subfield>
  </datafield>
  <datafield tag="024" ind1=" " ind2=" ">
    <subfield code="a">10.5281/zenodo.4105765</subfield>
    <subfield code="2">doi</subfield>
  </datafield>
  <datafield tag="980" ind1=" " ind2=" ">
    <subfield code="a">dataset</subfield>
  </datafield>
</record>
56
63
views
downloads
All versions This version
Views 5656
Downloads 6363
Data volume 548.5 MB548.5 MB
Unique views 4848
Unique downloads 3838

Share

Cite as