Planned intervention: On Wednesday April 3rd 05:30 UTC Zenodo will be unavailable for up to 2-10 minutes to perform a storage cluster upgrade.

There is a newer version of the record available.

Published March 28, 2022 | Version v1
Conference paper Open

Bayesian BERT for Trustful Hate Speech Detection

  • 1. University of Ljubljana, Ljubljana, Slovenia
  • 2. Jožef Stefan Institute
  • 3. Department of Computer Science, West University of Timisoara, Timisoara, Romania

Description

Hate speech is an important problem in the management of user-generated content. In order to remove offensive content or ban misbehaving users, content moderators need reliable hate speech detectors. Recently, deep neural networks based on transformer architecture, such as (multilingual) BERT model, achieve superior performance in many natural language classification tasks, including hate speech detection. So far, these methods have not been able to quantify their output in terms of reliability. We propose a Bayesian method using Monte Carlo Dropout within the attention layers of the transformer models to provide well-calibrated reliability estimates. We evaluate the introduced approach on hate speech detection problems in several languages. Our approach not only improves the classification performance of the state-of-the-art multilingual BERT model but the computed reliability scores also significantly reduce the workload in inspection of offending cases and in reannotation campaigns.

Files

Miok.pdf

Files (581.3 kB)

Name Size Download all
md5:5e8e533a690f9a2078dc4c59b6c7eceb
581.3 kB Preview Download

Additional details

Funding

EMBEDDIA – Cross-Lingual Embeddings for Less-Represented Languages in European News Media 825153
European Commission