Measuring Online Hate on 4chan Using Deep Learning
Description
This is the dataset released with the paper titled: "Measuring Online Hate on 4chan Using Deep Learning".
This dataset contains a collection of 500,000 posts extracted from the /pol/
board (Politically Incorrect) of 4chan using the 4chan API. The dataset is structured as a single CSV file with one column, com
, which includes the raw content of the posts.
The dataset does not preserve the structure of threads or replies; instead, it consists of a flat collection of individual posts extracted from /pol/
. This format is intended to support applications such as text analysis, natural language processing, and computational social science research by providing a straightforward dataset of raw post content.
Dataset Format
- File Format: CSV (Comma-Separated Values)
- Columns:
com
: The raw content of the post.
Source
The posts were extracted from 4chan’s /pol/
board using the official 4chan API. This board is known for hosting discussions on various topics, often with a focus on political content. Due to the nature of the /pol/
board, the content may include offensive language, hate speech, or otherwise sensitive material. Users should exercise caution and consider ethical implications when analysing this dataset.
Potential Use Cases
- Text analysis and natural language processing (NLP).
- Studies on online discourse, extremism, or political polarization.
- Research on language usage and sentiment in online forums.
- Development and testing of machine learning models for text classification or moderation.
Example Data
Here’s an example of what a few rows of the dataset look like:
com |
---|
"Why does no one talk about this?" |
"The government is hiding the truth!" |
"We need to take action against this injustice." |
If you find our dataset useful, please cite our paper:
@article{
}
Files
pol_500K4chan.csv
Files
(113.1 MB)
Name | Size | Download all |
---|---|---|
md5:f7adf7f42c51ad79359d79babad7fe3f
|
113.0 MB | Preview Download |
md5:0560cad335c590c7b259cd430b4ac2db
|
75.1 kB | Preview Download |