Published December 29, 2023 | Version v1
Dataset Open

Bubble reachers and uncivil discourse in polarized online public sphere comments dataset

  • 1. ROR icon Universidade Tecnológica Federal do Paraná
  • 2. ROR icon McGill University
  • 3. ROR icon University of Toronto

Description

This dataset contains comments in Portuguese and English gathered from various sources, such as news websites from Brazil and Canada, social media sites like Facebook and Reddit, e-commerce reviews, Wikipedia comments, among others. Each comment is accompanied by a "toxicity" score provided by the Perspective API.

Disclaimer: This file includes words or language that is considered profane, vulgar or offensive by some readers. Due to the topic studied in this article, quoting offensive language is academically justified, but we nor PLOS in no way endorse the use of these words or the content of the quotes. Likewise, the quotes do not represent the opinions of us or that of PLOS, and we condemn online harassment and offensive language.

Column information:

  • preprocessed_text: the text after undergoing preprocessing steps;
  • dataset: the given name of the dataset;
  • source: the dataset's source name;
  • dataset_source: a combination of the dataset name with its source to facilitate data aggregation tasks;
  • TOXICITY: a continuous score between 0.0 and 1.0 provided by the Perspective API.

In addition to the comments, there is a spreadsheet containing analyses referenced in the article associated with this dataset.

Files

Bubble_Reachers_and_Uncivil_Discourse_2023_comments.csv

Files (378.9 MB)