10.5281/zenodo.6389324
https://zenodo.org/records/6389324
oai:zenodo.org:6389324
Miok, Kristian
Kristian
Miok
University of Ljubljana, Ljubljana, Slovenia
Škrlj, Blaž
Blaž
Škrlj
Jožef Stefan Institute
Zaharie, Daniela
Daniela
Zaharie
Department of Computer Science, West University of Timisoara, Timisoara, Romania
Robnik-Šikonja, Marko
Marko
Robnik-Šikonja
University of Ljubljana, Ljubljana, Slovenia
Bayesian BERT for Trustful Hate Speech Detection
Zenodo
2022
2022-03-28
eng
10.5281/zenodo.6389323
https://zenodo.org/communities/embeddia
https://zenodo.org/communities/eu
Creative Commons Attribution 4.0 International
Hate speech is an important problem in the management of user-generated content. In order to remove offensive content or ban misbehaving users, content moderators need reliable hate speech detectors. Recently, deep neural networks based on transformer architecture, such as (multilingual) BERT model, achieve superior performance in many natural language classification tasks, including hate speech detection. So far, these methods have not been able to quantify their output in terms of reliability. We propose a Bayesian method using Monte Carlo Dropout within the attention layers of the transformer models to provide well-calibrated reliability estimates. We evaluate the introduced approach on hate speech detection problems in several languages. Our approach not only improves the classification performance of the state-of-the-art multilingual BERT model but the computed reliability scores also significantly reduce the workload in inspection of offending cases and in reannotation campaigns.
European Commission
10.13039/501100000780
825153
Cross-Lingual Embeddings for Less-Represented Languages in European News Media