Dataset Open Access
Al-Khatib, Khalid;
Völske, Michael;
Syed, Shahbaz;
Kolyada, Nikolay;
Stein, Benno
<?xml version='1.0' encoding='utf-8'?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:adms="http://www.w3.org/ns/adms#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dct="http://purl.org/dc/terms/" xmlns:dctype="http://purl.org/dc/dcmitype/" xmlns:dcat="http://www.w3.org/ns/dcat#" xmlns:duv="http://www.w3.org/ns/duv#" xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:frapo="http://purl.org/cerif/frapo/" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:gsp="http://www.opengis.net/ont/geosparql#" xmlns:locn="http://www.w3.org/ns/locn#" xmlns:org="http://www.w3.org/ns/org#" xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns:prov="http://www.w3.org/ns/prov#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:schema="http://schema.org/" xmlns:skos="http://www.w3.org/2004/02/skos/core#" xmlns:vcard="http://www.w3.org/2006/vcard/ns#" xmlns:wdrs="http://www.w3.org/2007/05/powder-s#"> <rdf:Description rdf:about="https://doi.org/10.5281/zenodo.3778298"> <rdf:type rdf:resource="http://www.w3.org/ns/dcat#Dataset"/> <dct:type rdf:resource="http://purl.org/dc/dcmitype/Dataset"/> <dct:identifier rdf:datatype="http://www.w3.org/2001/XMLSchema#anyURI">https://doi.org/10.5281/zenodo.3778298</dct:identifier> <foaf:page rdf:resource="https://doi.org/10.5281/zenodo.3778298"/> <dct:creator> <rdf:Description> <rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Agent"/> <foaf:name>Al-Khatib, Khalid</foaf:name> <foaf:givenName>Khalid</foaf:givenName> <foaf:familyName>Al-Khatib</foaf:familyName> <org:memberOf> <foaf:Organization> <foaf:name>Bauhaus-Universität Weimar</foaf:name> </foaf:Organization> </org:memberOf> </rdf:Description> </dct:creator> <dct:creator> <rdf:Description rdf:about="http://orcid.org/0000-0002-9283-6846"> <rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Agent"/> <dct:identifier rdf:datatype="http://www.w3.org/2001/XMLSchema#string">0000-0002-9283-6846</dct:identifier> <foaf:name>Völske, Michael</foaf:name> <foaf:givenName>Michael</foaf:givenName> <foaf:familyName>Völske</foaf:familyName> <org:memberOf> <foaf:Organization> <foaf:name>Bauhaus-Universität Weimar</foaf:name> </foaf:Organization> </org:memberOf> </rdf:Description> </dct:creator> <dct:creator> <rdf:Description rdf:about="http://orcid.org/0000-0002-4821-1507"> <rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Agent"/> <dct:identifier rdf:datatype="http://www.w3.org/2001/XMLSchema#string">0000-0002-4821-1507</dct:identifier> <foaf:name>Syed, Shahbaz</foaf:name> <foaf:givenName>Shahbaz</foaf:givenName> <foaf:familyName>Syed</foaf:familyName> <org:memberOf> <foaf:Organization> <foaf:name>Leipzig University</foaf:name> </foaf:Organization> </org:memberOf> </rdf:Description> </dct:creator> <dct:creator> <rdf:Description rdf:about="http://orcid.org/0000-0002-6493-9557"> <rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Agent"/> <dct:identifier rdf:datatype="http://www.w3.org/2001/XMLSchema#string">0000-0002-6493-9557</dct:identifier> <foaf:name>Kolyada, Nikolay</foaf:name> <foaf:givenName>Nikolay</foaf:givenName> <foaf:familyName>Kolyada</foaf:familyName> <org:memberOf> <foaf:Organization> <foaf:name>Bauhaus-Universität Weimar</foaf:name> </foaf:Organization> </org:memberOf> </rdf:Description> </dct:creator> <dct:creator> <rdf:Description rdf:about="http://orcid.org/0000-0001-9033-2217"> <rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Agent"/> <dct:identifier rdf:datatype="http://www.w3.org/2001/XMLSchema#string">0000-0001-9033-2217</dct:identifier> <foaf:name>Stein, Benno</foaf:name> <foaf:givenName>Benno</foaf:givenName> <foaf:familyName>Stein</foaf:familyName> <org:memberOf> <foaf:Organization> <foaf:name>Bauhaus-Universität Weimar</foaf:name> </foaf:Organization> </org:memberOf> </rdf:Description> </dct:creator> <dct:title>Webis ChangeMyView Corpus 2020 (Webis-CMV-20)</dct:title> <dct:publisher> <foaf:Agent> <foaf:name>Zenodo</foaf:name> </foaf:Agent> </dct:publisher> <dct:issued rdf:datatype="http://www.w3.org/2001/XMLSchema#gYear">2020</dct:issued> <dcat:keyword>social media</dcat:keyword> <dcat:keyword>argumentation</dcat:keyword> <dcat:keyword>persuasiveness</dcat:keyword> <dct:issued rdf:datatype="http://www.w3.org/2001/XMLSchema#date">2020-04-30</dct:issued> <dct:language rdf:resource="http://publications.europa.eu/resource/authority/language/ENG"/> <owl:sameAs rdf:resource="https://zenodo.org/record/3778298"/> <adms:identifier> <adms:Identifier> <skos:notation rdf:datatype="http://www.w3.org/2001/XMLSchema#anyURI">https://zenodo.org/record/3778298</skos:notation> <adms:schemeAgency>url</adms:schemeAgency> </adms:Identifier> </adms:identifier> <dct:isVersionOf rdf:resource="https://doi.org/10.5281/zenodo.3778297"/> <dct:isPartOf rdf:resource="https://zenodo.org/communities/webis"/> <dct:description><p>The Webis-CMV-20 dataset comprises all&nbsp;available posts and comments in the <a href="https://reddit.com/r/changemyview">ChangeMyView</a>&nbsp;subreddit&nbsp;from the foundation of the subreddit&nbsp;in 2005, until September 2017. From these, we have derived two sub-datasets for the tasks of persuasiveness prediction, and opinion malleability prediction. In addition, the corpus comprises historical posts by CMV authors, and derived personal characteristics.</p> <p><strong>Dataset specification</strong></p> <p>All files are in bzip2-compressed <a href="http://jsonlines.org/">JSON Lines</a> format.</p> <ul> <li><strong>threads.jsonl:</strong> contains all the selected discussion threads from CMV</li> <li><strong>pairs.jsonl:</strong> each record contains submission, delta-comment and nondelta-comment and the comments&#39;&nbsp;similarity score</li> <li><strong>posts-malleability.jsonl:</strong> contains&nbsp;posts&nbsp;for&nbsp;opinion mallebility prediction,&nbsp;in the format provided in the original <a href="https://files.pushshift.io/reddit/">Reddit Crawl</a> dataset</li> <li><strong>author_entity_category.jsonl:</strong> each record contains the author and list of Wikipedia entities mentioned by the author in the messages across all subreddits. For each mentioned entity we provide the following data:&nbsp;</li> </ul> <pre><code class="language-json">[title, wikidata_id, wikipedia_page_id, mentioned_entity_title, wikifier_score, subreddit_name, subreddit_id, subreddit_category_name, subreddit_topcategory_name]</code></pre> <ul> <li><strong>author_liwc.jsonl:</strong>&nbsp;personality traits features computed with <a href="https://liwc.wpengine.com/">LIWC</a> for the authors from pairs.jsonl and post_malleability.jsonl datasets</li> <li><strong>author_subreddit.jsonl:</strong> for each author statistics of all number of all posts (submissions/comments) across all subreddits are provided</li> <li><strong>author_subreddit_category.jsonl:</strong> similar to author_subreddit.jsonl, the statistics of all author posts is grouped by top-categories and categories according to <a href="https://snoopsnoo.com/subreddits/">snoopsnoo.com</a><br> &nbsp;</li> </ul></dct:description> <dct:accessRights rdf:resource="http://publications.europa.eu/resource/authority/access-right/PUBLIC"/> <dct:accessRights> <dct:RightsStatement rdf:about="info:eu-repo/semantics/openAccess"> <rdfs:label>Open Access</rdfs:label> </dct:RightsStatement> </dct:accessRights> <dcat:distribution> <dcat:Distribution> <dct:license rdf:resource="https://creativecommons.org/licenses/by/4.0/legalcode"/> <dcat:accessURL rdf:resource="https://doi.org/10.5281/zenodo.3778298"/> </dcat:Distribution> </dcat:distribution> <dcat:distribution> <dcat:Distribution> <dcat:accessURL rdf:resource="https://doi.org/10.5281/zenodo.3778298"/> <dcat:byteSize>1651067128</dcat:byteSize> <dcat:downloadURL rdf:resource="https://zenodo.org/record/3778298/files/author_entity_category.jsonl.bz2"/> </dcat:Distribution> </dcat:distribution> <dcat:distribution> <dcat:Distribution> <dcat:accessURL rdf:resource="https://doi.org/10.5281/zenodo.3778298"/> <dcat:byteSize>14720534</dcat:byteSize> <dcat:downloadURL rdf:resource="https://zenodo.org/record/3778298/files/author_liwc.jsonl.bz2"/> </dcat:Distribution> </dcat:distribution> <dcat:distribution> <dcat:Distribution> <dcat:accessURL rdf:resource="https://doi.org/10.5281/zenodo.3778298"/> <dcat:byteSize>6040755</dcat:byteSize> <dcat:downloadURL rdf:resource="https://zenodo.org/record/3778298/files/author_subreddit_category.jsonl.bz2"/> </dcat:Distribution> </dcat:distribution> <dcat:distribution> <dcat:Distribution> <dcat:accessURL rdf:resource="https://doi.org/10.5281/zenodo.3778298"/> <dcat:byteSize>7229520</dcat:byteSize> <dcat:downloadURL rdf:resource="https://zenodo.org/record/3778298/files/author_subreddit.jsonl.bz2"/> </dcat:Distribution> </dcat:distribution> <dcat:distribution> <dcat:Distribution> <dcat:accessURL rdf:resource="https://doi.org/10.5281/zenodo.3778298"/> <dcat:byteSize>18093525</dcat:byteSize> <dcat:downloadURL rdf:resource="https://zenodo.org/record/3778298/files/pairs.jsonl.bz2"/> </dcat:Distribution> </dcat:distribution> <dcat:distribution> <dcat:Distribution> <dcat:accessURL rdf:resource="https://doi.org/10.5281/zenodo.3778298"/> <dcat:byteSize>426602510</dcat:byteSize> <dcat:downloadURL rdf:resource="https://zenodo.org/record/3778298/files/posts_malleability.jsonl.bz2"/> </dcat:Distribution> </dcat:distribution> <dcat:distribution> <dcat:Distribution> <dcat:accessURL rdf:resource="https://doi.org/10.5281/zenodo.3778298"/> <dcat:byteSize>660888940</dcat:byteSize> <dcat:downloadURL rdf:resource="https://zenodo.org/record/3778298/files/threads.jsonl.bz2"/> </dcat:Distribution> </dcat:distribution> </rdf:Description> </rdf:RDF>
All versions | This version | |
---|---|---|
Views | 713 | 713 |
Downloads | 333 | 333 |
Data volume | 148.5 GB | 148.5 GB |
Unique views | 600 | 600 |
Unique downloads | 140 | 140 |