Dataset Open Access
Al-Khatib, Khalid;
Völske, Michael;
Syed, Shahbaz;
Kolyada, Nikolay;
Stein, Benno
<?xml version='1.0' encoding='UTF-8'?> <record xmlns="http://www.loc.gov/MARC21/slim"> <leader>00000nmm##2200000uu#4500</leader> <datafield tag="041" ind1=" " ind2=" "> <subfield code="a">eng</subfield> </datafield> <datafield tag="653" ind1=" " ind2=" "> <subfield code="a">social media</subfield> </datafield> <datafield tag="653" ind1=" " ind2=" "> <subfield code="a">argumentation</subfield> </datafield> <datafield tag="653" ind1=" " ind2=" "> <subfield code="a">persuasiveness</subfield> </datafield> <controlfield tag="005">20200513202042.0</controlfield> <controlfield tag="001">3778298</controlfield> <datafield tag="700" ind1=" " ind2=" "> <subfield code="u">Bauhaus-Universität Weimar</subfield> <subfield code="0">(orcid)0000-0002-9283-6846</subfield> <subfield code="a">Völske, Michael</subfield> </datafield> <datafield tag="700" ind1=" " ind2=" "> <subfield code="u">Leipzig University</subfield> <subfield code="0">(orcid)0000-0002-4821-1507</subfield> <subfield code="a">Syed, Shahbaz</subfield> </datafield> <datafield tag="700" ind1=" " ind2=" "> <subfield code="u">Bauhaus-Universität Weimar</subfield> <subfield code="0">(orcid)0000-0002-6493-9557</subfield> <subfield code="a">Kolyada, Nikolay</subfield> </datafield> <datafield tag="700" ind1=" " ind2=" "> <subfield code="u">Bauhaus-Universität Weimar</subfield> <subfield code="0">(orcid)0000-0001-9033-2217</subfield> <subfield code="a">Stein, Benno</subfield> </datafield> <datafield tag="856" ind1="4" ind2=" "> <subfield code="s">1651067128</subfield> <subfield code="z">md5:e41bdc8e1a48b900e89d1b8f55c820a0</subfield> <subfield code="u">https://zenodo.org/record/3778298/files/author_entity_category.jsonl.bz2</subfield> </datafield> <datafield tag="856" ind1="4" ind2=" "> <subfield code="s">14720534</subfield> <subfield code="z">md5:3c5edd9ceeac9ddbe2d6d9e6695ec006</subfield> <subfield code="u">https://zenodo.org/record/3778298/files/author_liwc.jsonl.bz2</subfield> </datafield> <datafield tag="856" ind1="4" ind2=" "> <subfield code="s">6040755</subfield> <subfield code="z">md5:76ab3db22c805e22847362dcf680911d</subfield> <subfield code="u">https://zenodo.org/record/3778298/files/author_subreddit_category.jsonl.bz2</subfield> </datafield> <datafield tag="856" ind1="4" ind2=" "> <subfield code="s">7229520</subfield> <subfield code="z">md5:4dd316a68b35ae58ccf4b0d726f31817</subfield> <subfield code="u">https://zenodo.org/record/3778298/files/author_subreddit.jsonl.bz2</subfield> </datafield> <datafield tag="856" ind1="4" ind2=" "> <subfield code="s">18093525</subfield> <subfield code="z">md5:f68d4129c7063488832f16e927dbfa1d</subfield> <subfield code="u">https://zenodo.org/record/3778298/files/pairs.jsonl.bz2</subfield> </datafield> <datafield tag="856" ind1="4" ind2=" "> <subfield code="s">426602510</subfield> <subfield code="z">md5:5a413ea0dd5c6ee391b623588ec71e00</subfield> <subfield code="u">https://zenodo.org/record/3778298/files/posts_malleability.jsonl.bz2</subfield> </datafield> <datafield tag="856" ind1="4" ind2=" "> <subfield code="s">660888940</subfield> <subfield code="z">md5:d9cd9aa21bf22d80dc298059af823310</subfield> <subfield code="u">https://zenodo.org/record/3778298/files/threads.jsonl.bz2</subfield> </datafield> <datafield tag="542" ind1=" " ind2=" "> <subfield code="l">open</subfield> </datafield> <datafield tag="260" ind1=" " ind2=" "> <subfield code="c">2020-04-30</subfield> </datafield> <datafield tag="909" ind1="C" ind2="O"> <subfield code="p">openaire_data</subfield> <subfield code="p">user-webis</subfield> <subfield code="o">oai:zenodo.org:3778298</subfield> </datafield> <datafield tag="100" ind1=" " ind2=" "> <subfield code="u">Bauhaus-Universität Weimar</subfield> <subfield code="a">Al-Khatib, Khalid</subfield> </datafield> <datafield tag="245" ind1=" " ind2=" "> <subfield code="a">Webis ChangeMyView Corpus 2020 (Webis-CMV-20)</subfield> </datafield> <datafield tag="980" ind1=" " ind2=" "> <subfield code="a">user-webis</subfield> </datafield> <datafield tag="540" ind1=" " ind2=" "> <subfield code="u">https://creativecommons.org/licenses/by/4.0/legalcode</subfield> <subfield code="a">Creative Commons Attribution 4.0 International</subfield> </datafield> <datafield tag="650" ind1="1" ind2="7"> <subfield code="a">cc-by</subfield> <subfield code="2">opendefinition.org</subfield> </datafield> <datafield tag="520" ind1=" " ind2=" "> <subfield code="a"><p>The Webis-CMV-20 dataset comprises all&nbsp;available posts and comments in the <a href="https://reddit.com/r/changemyview">ChangeMyView</a>&nbsp;subreddit&nbsp;from the foundation of the subreddit&nbsp;in 2005, until September 2017. From these, we have derived two sub-datasets for the tasks of persuasiveness prediction, and opinion malleability prediction. In addition, the corpus comprises historical posts by CMV authors, and derived personal characteristics.</p> <p><strong>Dataset specification</strong></p> <p>All files are in bzip2-compressed <a href="http://jsonlines.org/">JSON Lines</a> format.</p> <ul> <li><strong>threads.jsonl:</strong> contains all the selected discussion threads from CMV</li> <li><strong>pairs.jsonl:</strong> each record contains submission, delta-comment and nondelta-comment and the comments&#39;&nbsp;similarity score</li> <li><strong>posts-malleability.jsonl:</strong> contains&nbsp;posts&nbsp;for&nbsp;opinion mallebility prediction,&nbsp;in the format provided in the original <a href="https://files.pushshift.io/reddit/">Reddit Crawl</a> dataset</li> <li><strong>author_entity_category.jsonl:</strong> each record contains the author and list of Wikipedia entities mentioned by the author in the messages across all subreddits. For each mentioned entity we provide the following data:&nbsp;</li> </ul> <pre><code class="language-json">[title, wikidata_id, wikipedia_page_id, mentioned_entity_title, wikifier_score, subreddit_name, subreddit_id, subreddit_category_name, subreddit_topcategory_name]</code></pre> <ul> <li><strong>author_liwc.jsonl:</strong>&nbsp;personality traits features computed with <a href="https://liwc.wpengine.com/">LIWC</a> for the authors from pairs.jsonl and post_malleability.jsonl datasets</li> <li><strong>author_subreddit.jsonl:</strong> for each author statistics of all number of all posts (submissions/comments) across all subreddits are provided</li> <li><strong>author_subreddit_category.jsonl:</strong> similar to author_subreddit.jsonl, the statistics of all author posts is grouped by top-categories and categories according to <a href="https://snoopsnoo.com/subreddits/">snoopsnoo.com</a><br> &nbsp;</li> </ul></subfield> </datafield> <datafield tag="773" ind1=" " ind2=" "> <subfield code="n">doi</subfield> <subfield code="i">isVersionOf</subfield> <subfield code="a">10.5281/zenodo.3778297</subfield> </datafield> <datafield tag="024" ind1=" " ind2=" "> <subfield code="a">10.5281/zenodo.3778298</subfield> <subfield code="2">doi</subfield> </datafield> <datafield tag="980" ind1=" " ind2=" "> <subfield code="a">dataset</subfield> </datafield> </record>
All versions | This version | |
---|---|---|
Views | 713 | 713 |
Downloads | 333 | 333 |
Data volume | 148.5 GB | 148.5 GB |
Unique views | 600 | 600 |
Unique downloads | 140 | 140 |