A Labeled Spanish Twitter Dataset for Binary Cyberbullying Detection

Cumba-Armijos, Cumba-Armijos; Riofrío-Luzcando, Diego; RODRIGUEZ ARBOLEDA, VERONICA ELIZABETH; Carrión Jumbo, Joe

doi:10.5281/zenodo.18466670

Published August 1, 2022 | Version v1

Dataset Open

A Labeled Spanish Twitter Dataset for Binary Cyberbullying Detection

1. Universidad Internacional SEK

This dataset contains a Spanish-language Twitter corpus labeled for binary cyberbullying detection. It was collected using the Twitter API with Spanish language filtering and a geographic focus on Ecuador, and then manually annotated to support supervised learning experiments in hate speech / bullying detection and related NLP tasks.

The dataset is provided as a single semicolon-separated CSV file (CorpusBullying.csv) with three fields: a unique tweet identifier (ID), the cleaned tweet text (SpanishTweet), and a binary label (Label), where 1 indicates bullying/cyberbullying content (e.g., insults, severe verbal aggression, discriminatory attacks) and 0 indicates non-bullying. The tweet text distributed in the file is preprocessed (lowercased and cleaned by removing links, user mentions, special characters, and Spanish stop words) to facilitate direct use in machine learning pipelines.

The corpus includes 83,400 labeled tweets, with 16,247 bullying instances and 67,153 non-bullying instances. It can be used to benchmark text classification models (e.g., CNN/RNN/Transformer architectures), study class imbalance strategies, and compare feature-based and deep learning approaches for cyberbullying detection in Spanish.

Files

CorpusBullying.csv

Files (8.2 MB)

Name	Size	Download all
CorpusBullying.csv md5:4c9b6f4538a7db1e074caeb9633dc037	8.2 MB	Preview Download
README_CorpusBullying.md md5:7c2f0f2b2407314013c443a5aecdbfa3	2.2 kB	Preview Download

	All versions	This version
Views	47	47
Downloads	28	28
Data volume	269.0 MB	269.0 MB

A Labeled Spanish Twitter Dataset for Binary Cyberbullying Detection

Authors/Creators

Description

Files

CorpusBullying.csv

Files (8.2 MB)