Published March 21, 2025 | Version v1
Dataset Restricted

Messages from alternative Spanish Telegram channels, 2019-2024

Description

This dataset contains processed data extracted from Telegram channels using pytopicgram from 2019-12-01 to 2024-08-31. It includes anonymized channel information, sampled messages, and topics identified using BERTopic. The data has been anonymized and structured for ease of analysis. The dataset comprises two main CSV files:
 

1. Topics (topics.csv)

This file contains topics extracted from the full dataset using BERTopic. Each topic is described by a concise text generated by OpenAI o1.

Column Name Description
Topic Numeric identifier for each topic. -1 is the generic topic for non-assignable messages.
Name Human-readable name summarizing the topic.
Representation List of representative keywords for the topic.
Description Concise description of the topic generated by OpenAI.

2. Messages (messages.csv)

This file contains a 25% stratified sample of messages (on topic column) from Telegram channels.

Column Name Description
channel_id Anonymized identifier for the Telegram channel.
week_year Week and year when the message was posted (format: week_year).
media_type Type of media included in the message (txtimgvideoaudiodocweb).
reach Number of users reached by the message.
virality Virality score of the message.
is_viral Boolean indicating whether the message is considered viral.
topics Topic identifier associated with the message.
probs Probability scores for topic assignment.

 

Files

Restricted

The record is publicly accessible, but files are restricted to users with access.

Additional details

Related works

Is supplement to
10.5281/zenodo.14889387 (DOI)

Funding

CaixaBank
U-MIND SR21-00684