Measuring Online Hate on 4chan Using Deep Learning
Description
This is the dataset released with the paper titled: "Measuring Online Hate on 4chan Using Deep Learning".
This dataset contains a collection of 500,000 posts extracted from the /pol/ board (Politically Incorrect) of 4chan using the 4chan API. The dataset is structured as a single CSV file with one column, com, which includes the raw content of the posts.
The dataset does not preserve the structure of threads or replies; instead, it consists of a flat collection of individual posts extracted from /pol/. This format is intended to support applications such as text analysis, natural language processing, and computational social science research by providing a straightforward dataset of raw post content.
Dataset Format
- File Format: CSV (Comma-Separated Values)
- Columns:
com: The raw content of the post.
Source
The posts were extracted from 4chan’s /pol/ board using the official 4chan API. This board is known for hosting discussions on various topics, often with a focus on political content. Due to the nature of the /pol/ board, the content may include offensive language, hate speech, or otherwise sensitive material. Users should exercise caution and consider ethical implications when analysing this dataset.
Potential Use Cases
- Text analysis and natural language processing (NLP).
- Studies on online discourse, extremism, or political polarization.
- Research on language usage and sentiment in online forums.
- Development and testing of machine learning models for text classification or moderation.
Example Data
Here’s an example of what a few rows of the dataset look like:
| com |
|---|
| "Why does no one talk about this?" |
| "The government is hiding the truth!" |
| "We need to take action against this injustice." |
If you find our dataset useful, please cite our paper:
@article{
}
Files
pol_500K4chan.csv
Files
(113.1 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:f7adf7f42c51ad79359d79babad7fe3f
|
113.0 MB | Preview Download |
|
md5:0560cad335c590c7b259cd430b4ac2db
|
75.1 kB | Preview Download |