Dataset Open Access
Al-Khatib, Khalid;
Völske, Michael;
Syed, Shahbaz;
Kolyada, Nikolay;
Stein, Benno
The Webis-CMV-20 dataset comprises all available posts and comments in the ChangeMyView subreddit from the foundation of the subreddit in 2005, until September 2017. From these, we have derived two sub-datasets for the tasks of persuasiveness prediction, and opinion malleability prediction. In addition, the corpus comprises historical posts by CMV authors, and derived personal characteristics.
Dataset specification
All files are in bzip2-compressed JSON Lines format.
[title, wikidata_id, wikipedia_page_id, mentioned_entity_title, wikifier_score, subreddit_name, subreddit_id, subreddit_category_name, subreddit_topcategory_name]
Name | Size | |
---|---|---|
author_entity_category.jsonl.bz2
md5:e41bdc8e1a48b900e89d1b8f55c820a0 |
1.7 GB | Download |
author_liwc.jsonl.bz2
md5:3c5edd9ceeac9ddbe2d6d9e6695ec006 |
14.7 MB | Download |
author_subreddit.jsonl.bz2
md5:4dd316a68b35ae58ccf4b0d726f31817 |
7.2 MB | Download |
author_subreddit_category.jsonl.bz2
md5:76ab3db22c805e22847362dcf680911d |
6.0 MB | Download |
pairs.jsonl.bz2
md5:f68d4129c7063488832f16e927dbfa1d |
18.1 MB | Download |
posts_malleability.jsonl.bz2
md5:5a413ea0dd5c6ee391b623588ec71e00 |
426.6 MB | Download |
threads.jsonl.bz2
md5:d9cd9aa21bf22d80dc298059af823310 |
660.9 MB | Download |
All versions | This version | |
---|---|---|
Views | 739 | 739 |
Downloads | 354 | 354 |
Data volume | 156.1 GB | 156.1 GB |
Unique views | 625 | 625 |
Unique downloads | 146 | 146 |