Dataset Open Access

Webis ChangeMyView Corpus 2020 (Webis-CMV-20)

Al-Khatib, Khalid; Völske, Michael; Syed, Shahbaz; Kolyada, Nikolay; Stein, Benno


MARC21 XML Export

<?xml version='1.0' encoding='UTF-8'?>
<record xmlns="http://www.loc.gov/MARC21/slim">
  <leader>00000nmm##2200000uu#4500</leader>
  <datafield tag="041" ind1=" " ind2=" ">
    <subfield code="a">eng</subfield>
  </datafield>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">social media</subfield>
  </datafield>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">argumentation</subfield>
  </datafield>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">persuasiveness</subfield>
  </datafield>
  <controlfield tag="005">20200513202042.0</controlfield>
  <controlfield tag="001">3778298</controlfield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">Bauhaus-Universität Weimar</subfield>
    <subfield code="0">(orcid)0000-0002-9283-6846</subfield>
    <subfield code="a">Völske, Michael</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">Leipzig University</subfield>
    <subfield code="0">(orcid)0000-0002-4821-1507</subfield>
    <subfield code="a">Syed, Shahbaz</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">Bauhaus-Universität Weimar</subfield>
    <subfield code="0">(orcid)0000-0002-6493-9557</subfield>
    <subfield code="a">Kolyada, Nikolay</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">Bauhaus-Universität Weimar</subfield>
    <subfield code="0">(orcid)0000-0001-9033-2217</subfield>
    <subfield code="a">Stein, Benno</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">1651067128</subfield>
    <subfield code="z">md5:e41bdc8e1a48b900e89d1b8f55c820a0</subfield>
    <subfield code="u">https://zenodo.org/record/3778298/files/author_entity_category.jsonl.bz2</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">14720534</subfield>
    <subfield code="z">md5:3c5edd9ceeac9ddbe2d6d9e6695ec006</subfield>
    <subfield code="u">https://zenodo.org/record/3778298/files/author_liwc.jsonl.bz2</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">6040755</subfield>
    <subfield code="z">md5:76ab3db22c805e22847362dcf680911d</subfield>
    <subfield code="u">https://zenodo.org/record/3778298/files/author_subreddit_category.jsonl.bz2</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">7229520</subfield>
    <subfield code="z">md5:4dd316a68b35ae58ccf4b0d726f31817</subfield>
    <subfield code="u">https://zenodo.org/record/3778298/files/author_subreddit.jsonl.bz2</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">18093525</subfield>
    <subfield code="z">md5:f68d4129c7063488832f16e927dbfa1d</subfield>
    <subfield code="u">https://zenodo.org/record/3778298/files/pairs.jsonl.bz2</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">426602510</subfield>
    <subfield code="z">md5:5a413ea0dd5c6ee391b623588ec71e00</subfield>
    <subfield code="u">https://zenodo.org/record/3778298/files/posts_malleability.jsonl.bz2</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">660888940</subfield>
    <subfield code="z">md5:d9cd9aa21bf22d80dc298059af823310</subfield>
    <subfield code="u">https://zenodo.org/record/3778298/files/threads.jsonl.bz2</subfield>
  </datafield>
  <datafield tag="542" ind1=" " ind2=" ">
    <subfield code="l">open</subfield>
  </datafield>
  <datafield tag="260" ind1=" " ind2=" ">
    <subfield code="c">2020-04-30</subfield>
  </datafield>
  <datafield tag="909" ind1="C" ind2="O">
    <subfield code="p">openaire_data</subfield>
    <subfield code="p">user-webis</subfield>
    <subfield code="o">oai:zenodo.org:3778298</subfield>
  </datafield>
  <datafield tag="100" ind1=" " ind2=" ">
    <subfield code="u">Bauhaus-Universität Weimar</subfield>
    <subfield code="a">Al-Khatib, Khalid</subfield>
  </datafield>
  <datafield tag="245" ind1=" " ind2=" ">
    <subfield code="a">Webis ChangeMyView Corpus 2020 (Webis-CMV-20)</subfield>
  </datafield>
  <datafield tag="980" ind1=" " ind2=" ">
    <subfield code="a">user-webis</subfield>
  </datafield>
  <datafield tag="540" ind1=" " ind2=" ">
    <subfield code="u">https://creativecommons.org/licenses/by/4.0/legalcode</subfield>
    <subfield code="a">Creative Commons Attribution 4.0 International</subfield>
  </datafield>
  <datafield tag="650" ind1="1" ind2="7">
    <subfield code="a">cc-by</subfield>
    <subfield code="2">opendefinition.org</subfield>
  </datafield>
  <datafield tag="520" ind1=" " ind2=" ">
    <subfield code="a">&lt;p&gt;The Webis-CMV-20 dataset comprises all&amp;nbsp;available posts and comments in the &lt;a href="https://reddit.com/r/changemyview"&gt;ChangeMyView&lt;/a&gt;&amp;nbsp;subreddit&amp;nbsp;from the foundation of the subreddit&amp;nbsp;in 2005, until September 2017. From these, we have derived two sub-datasets for the tasks of persuasiveness prediction, and opinion malleability prediction. In addition, the corpus comprises historical posts by CMV authors, and derived personal characteristics.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dataset specification&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;All files are in bzip2-compressed &lt;a href="http://jsonlines.org/"&gt;JSON Lines&lt;/a&gt; format.&lt;/p&gt;

&lt;ul&gt;
	&lt;li&gt;&lt;strong&gt;threads.jsonl:&lt;/strong&gt; contains all the selected discussion threads from CMV&lt;/li&gt;
	&lt;li&gt;&lt;strong&gt;pairs.jsonl:&lt;/strong&gt; each record contains submission, delta-comment and nondelta-comment and the comments&amp;#39;&amp;nbsp;similarity score&lt;/li&gt;
	&lt;li&gt;&lt;strong&gt;posts-malleability.jsonl:&lt;/strong&gt; contains&amp;nbsp;posts&amp;nbsp;for&amp;nbsp;opinion mallebility prediction,&amp;nbsp;in the format provided in the original &lt;a href="https://files.pushshift.io/reddit/"&gt;Reddit Crawl&lt;/a&gt; dataset&lt;/li&gt;
	&lt;li&gt;&lt;strong&gt;author_entity_category.jsonl:&lt;/strong&gt; each record contains the author and list of Wikipedia entities mentioned by the author in the messages across all subreddits. For each mentioned entity we provide the following data:&amp;nbsp;&lt;/li&gt;
&lt;/ul&gt;

&lt;pre&gt;&lt;code class="language-json"&gt;[title, wikidata_id, wikipedia_page_id, mentioned_entity_title, wikifier_score, subreddit_name, subreddit_id, subreddit_category_name, subreddit_topcategory_name]&lt;/code&gt;&lt;/pre&gt;

&lt;ul&gt;
	&lt;li&gt;&lt;strong&gt;author_liwc.jsonl:&lt;/strong&gt;&amp;nbsp;personality traits features computed with &lt;a href="https://liwc.wpengine.com/"&gt;LIWC&lt;/a&gt; for the authors from pairs.jsonl and post_malleability.jsonl datasets&lt;/li&gt;
	&lt;li&gt;&lt;strong&gt;author_subreddit.jsonl:&lt;/strong&gt; for each author statistics of all number of all posts (submissions/comments) across all subreddits are provided&lt;/li&gt;
	&lt;li&gt;&lt;strong&gt;author_subreddit_category.jsonl:&lt;/strong&gt; similar to author_subreddit.jsonl, the statistics of all author posts is grouped by top-categories and categories according to &lt;a href="https://snoopsnoo.com/subreddits/"&gt;snoopsnoo.com&lt;/a&gt;&lt;br&gt;
	&amp;nbsp;&lt;/li&gt;
&lt;/ul&gt;</subfield>
  </datafield>
  <datafield tag="773" ind1=" " ind2=" ">
    <subfield code="n">doi</subfield>
    <subfield code="i">isVersionOf</subfield>
    <subfield code="a">10.5281/zenodo.3778297</subfield>
  </datafield>
  <datafield tag="024" ind1=" " ind2=" ">
    <subfield code="a">10.5281/zenodo.3778298</subfield>
    <subfield code="2">doi</subfield>
  </datafield>
  <datafield tag="980" ind1=" " ind2=" ">
    <subfield code="a">dataset</subfield>
  </datafield>
</record>
53
12
views
downloads
All versions This version
Views 5353
Downloads 1212
Data volume 5.4 GB5.4 GB
Unique views 4242
Unique downloads 44

Share

Cite as