EsCorpiusBias: Contextual Annotation and Transformer-Based Detection of Racism and Sexism in Spanish Dialogues
Creators
Description
The rise of online communication platforms has significantly increased exposure to harmful discourse, presenting ongoing challenges for digital moderation and user well-being. This paper introduces the EsCorpiusBias corpus, designed to enhance automated detection of sexism and racism within Spanish-language online dialogues, specifically sourced from the Mediavida forum. By means of a systematic, context-sensitive annotation protocol, approximately 1,000 three-turn dialogue units per bias category has been annotated, ensuring nuanced recognition of pragmatic and conversational subtleties. The annotation guidelines have been meticulously developed, covering explicit and implicit manifestations of sexism and racism, and annotations were performed using the Prodigy tool, resulting in moderate to substantial inter-annotator agreement (Cohen's Kappa: 0.55 for sexism, 0.79 for racism). Models including Logistic Regression, SpaCy's baseline n-gram bag-of-words model, and transformer-based BETO were trained and evaluated, demonstrating that contextualized transformer-based approaches significantly outperform baseline and general-purpose models. Additionally, lexical overlap analyses indicated a strong reliance on explicit lexical indicators, highlighting limitations in handling implicit biases. This research underscores the importance of contextually grounded, domain-specific fine-tuning for effective automated detection of toxicity, providing robust resources and methodologies to foster socially responsible NLP systems within Spanish-speaking online communities.
Notes
Files
labels.txt
Additional details
Related works
- Is part of
- Dataset: 10.5281/zenodo.15023855 (DOI)