Published February 10, 2025 | Version v1
Dataset Open

Measuring Online Hate on 4chan Using Deep Learning

  • 1. EDMO icon Royal Holloway, University of London
  • 2. ROR icon University of Surrey

Description

This is the dataset released with the paper titled: "Measuring Online Hate on 4chan Using Deep Learning".

This dataset contains a collection of 500,000 posts extracted from the /pol/ board (Politically Incorrect) of 4chan using the 4chan API. The dataset is structured as a single CSV file with one column, com, which includes the raw content of the posts.

The dataset does not preserve the structure of threads or replies; instead, it consists of a flat collection of individual posts extracted from /pol/. This format is intended to support applications such as text analysis, natural language processing, and computational social science research by providing a straightforward dataset of raw post content.

Dataset Format

  • File Format: CSV (Comma-Separated Values)
  • Columns:
    • com: The raw content of the post.

Source

The posts were extracted from 4chan’s /pol/ board using the official 4chan API. This board is known for hosting discussions on various topics, often with a focus on political content. Due to the nature of the /pol/ board, the content may include offensive language, hate speech, or otherwise sensitive material. Users should exercise caution and consider ethical implications when analysing this dataset.

Potential Use Cases

  • Text analysis and natural language processing (NLP).
  • Studies on online discourse, extremism, or political polarization.
  • Research on language usage and sentiment in online forums.
  • Development and testing of machine learning models for text classification or moderation.

Example Data

Here’s an example of what a few rows of the dataset look like:

com
"Why does no one talk about this?"
"The government is hiding the truth!"
"We need to take action against this injustice."

If you find our dataset useful, please cite our paper:

@article{
}

Files

pol_500K4chan.csv

Files (113.1 MB)

Name Size Download all
md5:f7adf7f42c51ad79359d79babad7fe3f
113.0 MB Preview Download
md5:0560cad335c590c7b259cd430b4ac2db
75.1 kB Preview Download