Code for "Towards Robust Online Sexism Detection: A Multi-Model Approach with BERT, XLM-RoBERTa, and DistilBERT for EXIST 2023 Tasks"
- 1. Utrecht University
Description
Towards Robust Online Sexism Detection: A Multi-Model Approach with BERT, XLM-RoBERTa, and DistilBERT for EXIST 2023 Tasks
Hadi Mohammadi1,∗ , Anastasia Giachanou1 and Ayoub Bagheri1.
1Department of Methodology and Statistics, Utrecht University, The Netherlands.
Abstract
This research investigates the application of pre-trained transformer-based models, including BERT, XLM- RoBERTa, and DistilBERT, in the context of the EXIST 2023 shared task, which focuses on identifying and categorizing online sexism. The study emphasizes the crucial role of Natural Language Processing (NLP) in detecting harmful content, and it draws on previous competitions that have incorporated tasks to detect hate speech and abusive language. The methodology combines various advanced techniques from the text classification domain, including the use of additional datasets, data preprocessing, and model building. The research also explores data augmentation techniques and label encoding as preprocessing steps. The study’s findings indicate that the developed model performs optimally in English, and it suggests that the use of a voting system and the combination of outputs from multiple models contribute to the overall performance. The research concludes with a call for sustained initiatives to curb the prevalence of harmful content on digital platforms, and it outlines future work directions, including incorporating additional information about annotators, the assessment of annotator reliability, and exploring more sophisticated techniques for handling imbalances.
Keywords [1]
Online Sexism, Natural Language Processing (NLP), Transformer-based Models, BERT.
[1]CLEF 2023: Conference and Labs of the Evaluation Forum, September 18–21, 2023, Thessaloniki, Greece
EMAIL: email1h.mohammadi@uu.nl (H. Mohammadi); a.giachanou@uu.nl (A. Giachanou); a.bagheri@uu.nl (A. Bagheri)
ORCID: 0000-0003-0860-9200 (H. Mohammadi)
©️ 2023 Copyright for this paper by its authors.
Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
CEUR Workshop Proceedings (CEUR-WS.org)
Data Availability
In this article, the dataset utilized is specifically associated with EXIST 2023 competition. Furthermore, the code utilized in the study was made accessible through a dedicated Zenodo.
Files
M&S_NLP technical report for Exist 2023 (new version).ipynb
Files
(868.2 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:4ac8569af1bf80bd9d3f65ad30c526b2
|
868.2 kB | Preview Download |