2536218
doi
10.5281/zenodo.2536218
oai:zenodo.org:2536218
user-mdm-dtic-upf
Vicenç, Gómez
Universitat Pompeu Fabra
Andreas, Kaltenbrunner
Universitat Pompeu Fabra
Dataset of discussion threads from Meneame
Pablo, Aragón
Universitat Pompeu Fabra
doi:10.1002/poi3.158
handle:10230/34365
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
online conversations
online discussions
discussion threads
social news
meneame
<p>Dataset from our ICWSM 2017 paper. When using this resource, please use the following citation:</p>
<blockquote>
<p>Aragón P., Gómez V., Kaltenbrunner A. (2017) To Thread or Not to Thread: The Impact of Conversation Threading on Online Discussion, ICWSM-17- 11th International AAAI Conference on Web and Social Media, Montreal, Canada.</p>
</blockquote>
<blockquote>
<p>@inproceedings {aragon2017ICWSM,<br>
author = {Arag\'on, Pablo and G\'omez, Vicen\c{c} and Kaltenbrunner, Andreas},<br>
title = {To Thread or Not to Thread: The Impact of Conversation Threading on Online Discussion},<br>
booktitle = {ICWSM-17 - 11th International AAAI Conference on Web and Social Media},<br>
publisher = {The AAAI Press},<br>
location = {Montreal, Canada},<br>
year = 2017<br>
}</p>
</blockquote>
<p>More info about this dataset can also be found at:</p>
<blockquote>
<p>Aragón P., Gómez V., Kaltenbrunner A., (2017) Detecting Platform Effects in Online Discussions, Policy & Internet, 9, 2017.</p>
</blockquote>
<blockquote>
<p>@article{aragon2017PI,<br>
author = {Arag\'on, Pablo and G\'omez, Vicen\c{c} and Kaltenbrunner, Andreas},<br>
title = {Detecting Platform Effects in Online Discussions},<br>
journal = {Policy \& Internet},<br>
volume = {9},<br>
number = {4},<br>
pages = {420-443},<br>
doi = {10.1002/poi3.158},<br>
url = {https://onlinelibrary.wiley.com/doi/abs/10.1002/poi3.158},<br>
eprint = {https://onlinelibrary.wiley.com/doi/pdf/10.1002/poi3.158},<br>
year = {2017}<br>
}</p>
</blockquote>
<p> </p>
<p><strong>Crawling process</strong></p>
<p>We built a crawling process that collects all the stories in the front page of Meneame from 2011 to 2015 (both years included). We then performed a second crawling process to collect every comment from the discussion thread of each story. From both crawling processes, we obtained 72,005 stories and 5,385,324 comments.</p>
<p>It is important to highlight two issues taken into account when the crawler was designed. First, the <a href="https://www.meneame.net/robots.txt">machine-readable robots.txt</a> file on Meneame does not disallow this process. Second, the footnote of Meneame indicates the licenses of the code, graphics and content of the website. The license for content is <a href="https://creativecommons.org/licenses/by/3.0/es/deed.en">Attribution 3.0 Spain (CC BY 3.0 ES)</a> which allows us to release this dataset.</p>
<p><strong>Fields</strong></p>
<p>Every discussion thread is stored in a JSON file named with the URL slug of the corresponding story in Meneame, located in a yyyy-mm-dd folder. The JSON file is an array of elements with the following fields:</p>
<ul>
<li>
<p>id (string): ID of the story/comment</p>
</li>
<li>
<p>sent (timestamp): Date of the story/comment as yyyy-MM-ddThh:mm:ssZ.</p>
</li>
<li>
<p>message (string): Text of the story/comment</p>
</li>
<li>
<p>user (string): Username of the authoring story/comment</p>
</li>
<li>
<p>karma (number): Karma score of the comment when the crawling was performed</p>
</li>
<li>
<p>comments_count (number): Number of comments in reply to the story/post</p>
</li>
<li>
<p>votes (number): Number of votes to the story/comment</p>
</li>
<li>
<p>thread (string): URL of the thread</p>
</li>
<li>
<p>thread_id (string): Sequential arriving order to the thread (0 if story, >=1 if comment)</p>
</li>
<li>
<p>depth (string): Depth within the thread (0 if story, >=1 if comment)</p>
</li>
<li>
<p>url (string): URL of the specific story/comment</p>
</li>
</ul>
<p> </p>
<ul>
<li>
<p>title (string): Title, only available for stories.</p>
</li>
<li>
<p>published (string): Date when published on the front page, only available for stories.</p>
</li>
<li>
<p>tags (string): Tags, only available for stories.</p>
</li>
<li>
<p>clics (string): Number of clicks, only available for stories.</p>
</li>
<li>
<p>users (string): Number of user votes, only available for stories.</p>
</li>
<li>
<p>anonymous (string): Number of anonymous votes, only available for stories.</p>
</li>
<li>
<p>negatives (string): Number of negative votes, only available for stories.</p>
</li>
</ul>
<p> </p>
<ul>
<li>
<p>in_reply_to_id (string): ID of the parent story/comment, only available for comments.</p>
</li>
<li>
<p>in_reply_to_user (string): Authoring user of the parent story/comment, only available for comments.</p>
</li>
<li>
<p>in_reply_to_thread_id (string): Sequential arriving order to the thread of of the parent story/comment, only available for comments.</p>
</li>
</ul>
<p><strong>Acknowledgment</strong></p>
<p>This work is supported by the Spanish Ministry of Economy and Competitiveness under the María de Maeztu Units of Excellence Programme (MDM-2015-0502).</p>
Zenodo
2019-01-14
info:eu-repo/semantics/other
2536217
user-mdm-dtic-upf
1579893981.019619
920783644
md5:0b8a28677d7b0746f818897e4ed06697
https://zenodo.org/records/2536218/files/meneame.zip
public
10.1002/poi3.158
Is supplement to
doi
10230/34365
Is supplement to
handle
10.5281/zenodo.2536217
isVersionOf
doi