Published July 28, 2025 | Version v1
Dataset Restricted

Dataset for: "Exploring Left-Wing Extremism on the Decentralized Web: An Analysis of Lemmygrad.ml"

  • 1. EDMO icon Binghamton University
  • 2. Cyprus University of Technology

Description

This repository contains the dataset, along with the source code used to produce the main findings of the paper, "Exploring Left-Wing Extremism on the Decentralized Web: An Analysis of Lemmygrad.ml."

Dataset Overview

This dataset consists of submissions and comments collected from Lemmygrad.ml communities, spanning from August 2019 to April 2022. To preserve anonymity, we anonymized all post identifiers and author names. In addition, any words beginning with u/ have been replaced with u/anonymized_author_name, and any words beginning with @ have been replaced with @anonymized_at_word.

The dataset includes:

File Structure


├── analysis.ipynb                      # Jupyter notebook with analysis
└── data/
    ├── lemmygradml_dataset.ndjson      # Main dataset 
    ├── lemmygradml_perspective_api_scores.jsonl  # Perspective API scores 
    ├── id_to_topic.jsonl              # Topic assignments  
    └── topic_keywords.json            # Topic keywords dictionary

Dataset Files

1. Main Dataset (`lemmygradml_dataset.ndjson`)
Fields:
- `id`: Unique post identifier
- `author`: Author username
- `date`: Post date
- `community`: Community name
- `title`: Post title 
- `post`: Post content


2. Toxicity Scores (`lemmygradml_perspective_api_scores.jsonl`)
Fields:
- `id`: Post identifier (matches main dataset)
- `toxicity`: Toxicity score (0-1)
- `severe_toxicity`: Severe toxicity score (0-1)
- `identity_attack`: Identity attack score (0-1)
- `profanity`: Profanity score (0-1)
- `threat`: Threat score (0-1)
- `insult`: Insult score (0-1)


3. Topic Assignments (`id_to_topic.jsonl`)
Fields:
- `id`: Post identifier (matches main dataset)
- `topic`: Topic ID (0-164)


4. Topic Keywords (`topic_keywords.json`)

Content: Dictionary mapping topic IDs to lists of representative keywords

If you use this dataset in any publication, of any form and kind, please cite using this data:

@inproceedings{balci2024exploring,
  title={Exploring Left-Wing Extremism on the Decentralized Web: An Analysis of Lemmygrad. ml},
  author={Balci, Utkucan and Sirivianos, Michael and Blackburn, Jeremy},
  booktitle={Workshop Proceedings of the 18th International AAAI Conference on Web and Social Media-Workshop: DeWeb},
  volume={2024},
  pages={1st},
  year={2024}
}

Files

Restricted

The record is publicly accessible, but files are restricted to users with access.

Additional details

Funding

European Commission
MedDMO 101083756