# EsCorpiusBias Dataset

A curated corpus of Spanish-language forum dialogues annotated for sexism and racism. Each file contains one JSONL object per line, preserving the full text, binary label, and metadata.

---

## Overview

EsCorpiusBias is a multi-turn dialogue dataset extracted from the Mediavida forum, annotated for two primary bias categories:

- **Sexism**
- **Racism**

Each example comprises:
- The raw comment text (three-turn dialogues were used during annotation, but only the target comment appears here).
- A binary label indicating whether the comment is sexist or racist.
- A `meta` block containing any additional identifiers (e.g., article or comment IDs).

These annotations were performed using Prodigy, following detailed guidelines (see our paper for full annotation protocols and definitions).

---

## Files

EsCorpiusBias/<br>
├── mediavida_sexism.jsonl<br>
├── mediavida_racism.jsonl<br>
├── labels.txt<br>
└── README.md<br>


- **mediavida_sexism.jsonl**  
  All annotated examples for sexism. One JSON object per line.
- **mediavida_racism.jsonl**  
  All annotated examples for racism. One JSON object per line.
- **labels.txt**  
  List of valid labels (one per line):
  - Sexism
  - Non‐Sexism
  - Racism
  - Non‐Racism
- **README.md**  
This file.

---

## Data Format

Each line in the `.jsonl` files is a JSON object with the following fields:

```jsonc
{
"text": "Allá vamos.\n\nRecien instalado W10\n\n#26 me pasas el wal xD mola mucho xD",
"label": "Non-Racism",
"meta": {
  "article_id": "a542752",
  "comment_id": 30
}
}
```

## Annotation Guidelines

Full annotation guidelines (definitions of sexism, racism, inter-annotator agreement, etc.) are documented in the accompanying paper.

## Citation

If you use this dataset in your research, please cite the following:

```bibtex
@article{EsCorpiusBias2025,
  title        = {EsCorpiusBias: Contextual Annotation and Transformer-Based Detection of Racism and Sexism in Spanish Dialogues},
  author       = {Kharitonova, Ksenia and Callejas, Zoraida and Griol, David and Gutiérrez-Fandiño, Asier and Gutiérrez-Hernando, Javier and Pérez-Fernández, David},
  journal      = {NLP},
  year         = {2025},
  volume       = {1},
  number       = {1},
  pages        = {0},
  doi          = {10.3390/xxxxx},
  url          = {https://doi.org/10.3390/xxxxx},
}
```
