Published January 11, 2018
| Version v1
Dataset
Restricted
Posts from a brazilian anonymous imageboard
Creators
Description
This dataset contains text with toxic content and hate speech.
A set of discussion threads published in a brazilian anonymous imageboard, a 4chan-style discussion forum. This dataset includes 158,280 user posts in 4,539 threads published between 18 dec. 2016 and 19 jan. 2017. The data was collected through a web scraper developed for this purpose, which gattered textual content and published date from the posts. Images where not collected due to possible ilegal content.
The data was used in the master degree thesis "Análise das apropriações do anonimato nas subculturas dos imageboards".
Files
Additional details
Related works
- Is compiled by
- Thesis: https://biblioteca.feevale.br/Vinculo2/000011/00001100.pdf (URL)