Dataset Open Access

Dataset of discussion threads from Meneame

Pablo, Aragón; Vicenç, Gómez; Andreas, Kaltenbrunner

Dataset from our ICWSM 2017 paper. When using this resource, please use the following citation:

Aragón P., Gómez V., Kaltenbrunner A. (2017) To Thread or Not to Thread: The Impact of Conversation Threading on Online Discussion, ICWSM-17- 11th International AAAI Conference on Web and Social Media, Montreal, Canada.

@inproceedings {aragon2017ICWSM,
author = {Arag\'on, Pablo and G\'omez, Vicen\c{c} and Kaltenbrunner, Andreas},
title = {To Thread or Not to Thread: The Impact of Conversation Threading on Online Discussion},
booktitle = {ICWSM-17 - 11th International AAAI Conference on Web and Social Media},
publisher = {The AAAI Press},
location = {Montreal, Canada},
year = 2017
}

More info about this dataset can also be found at:

Aragón P., Gómez V., Kaltenbrunner A., (2017) Detecting Platform Effects in Online Discussions, Policy & Internet, 9, 2017.

@article{aragon2017PI,
author = {Arag\'on, Pablo and G\'omez, Vicen\c{c} and Kaltenbrunner, Andreas},
title = {Detecting Platform Effects in Online Discussions},
journal = {Policy \& Internet},
volume = {9},
number = {4},
pages = {420-443},
doi = {10.1002/poi3.158},
url = {https://onlinelibrary.wiley.com/doi/abs/10.1002/poi3.158},
eprint = {https://onlinelibrary.wiley.com/doi/pdf/10.1002/poi3.158},
year = {2017}
}

 

Crawling process

We built a crawling process that collects all the stories in the front page of Meneame from 2011 to 2015 (both years included). We then performed a second crawling process to collect every comment from the discussion thread of each story. From both crawling processes, we obtained 72,005 stories and 5,385,324 comments.

It is important to highlight two issues taken into account when the crawler was designed. First, the machine-readable robots.txt file on Meneame does not disallow this process. Second, the footnote of Meneame indicates the licenses of the code, graphics and content of the website. The license for content is Attribution 3.0 Spain (CC BY 3.0 ES) which allows us to release this dataset.

Fields

Every discussion thread is stored in a JSON file named with the URL slug of the corresponding story in Meneame, located in a yyyy-mm-dd folder. The JSON file is an array of elements with the following fields:

  • id (string): ID of the story/comment

  • sent (timestamp): Date of the story/comment as yyyy-MM-ddThh:mm:ssZ.

  • message (string): Text of the story/comment

  • user (string): Username of the authoring story/comment

  • karma (number): Karma score of the comment when the crawling was performed

  • comments_count (number): Number of comments in reply to the story/post

  • votes (number): Number of votes to the story/comment

  • thread (string): URL of the thread

  • thread_id (string): Sequential arriving order to the thread (0 if story, >=1 if comment)

  • depth (string): Depth within the thread (0 if story, >=1 if comment)

  • url (string): URL of the specific story/comment

 

  • title (string): Title, only available for stories.

  • published (string): Date when published on the front page, only available for stories.

  • tags (string): Tags, only available for stories.

  • clics (string): Number of clicks, only available for stories.

  • users (string): Number of user votes, only available for stories.

  • anonymous (string): Number of anonymous votes, only available for stories.

  • negatives (string): Number of negative votes, only available for stories.

 

  • in_reply_to_id (string): ID of the parent story/comment, only available for comments.

  • in_reply_to_user (string): Authoring user of the parent story/comment, only available for comments.

  • in_reply_to_thread_id (string): Sequential arriving order to the thread of of the parent story/comment, only available for comments.

Acknowledgment

This work is supported by the Spanish Ministry of Economy and Competitiveness under the María de Maeztu Units of Excellence Programme (MDM-2015-0502).

Files (920.8 MB)
Name Size
meneame.zip
md5:0b8a28677d7b0746f818897e4ed06697
920.8 MB Download
577
31
views
downloads
All versions This version
Views 577577
Downloads 3131
Data volume 28.5 GB28.5 GB
Unique views 551551
Unique downloads 2727

Share

Cite as