00000nam##2200000uu#4500 6390721 doi 10.5281/zenodo.6390721 oai:zenodo.org:6390721 user-embeddia user-eu Škrlj, Blaž Jožef Stefan Institute Zaharie, Daniela Department of Computer Science, West University of Timisoara, Timisoara, Romania Robnik-Šikonja, Marko University of Ljubljana, Ljubljana, Slovenia Bayesian BERT for Trustful Hate Speech Detection Miok, Kristian University of Ljubljana, Ljubljana, Slovenia info:eu-repo/semantics/openAccess Creative Commons Attribution 4.0 International https://creativecommons.org/licenses/by/4.0/legalcode cc-by-4.0 spdx <p>Hate speech is an important problem in the management of user-generated content. In order to remove offensive content or ban misbehaving users, content moderators need reliable hate speech detectors. Recently, deep neural networks based on transformer architecture, such as (multilingual) BERT model, achieve superior performance in many natural language classification tasks, including hate speech detection. So far, these methods have not been able to quantify their output in terms of reliability. We propose a Bayesian method using Monte Carlo Dropout within the attention layers of the transformer models to provide well-calibrated reliability estimates. We evaluate the introduced approach on hate speech detection problems in several languages. Our approach not only improves the classification performance of the state-of-the-art multilingual BERT model but the computed reliability scores also significantly reduce the workload in inspection of offending cases and in reannotation campaigns.</p> eng Zenodo 2022-03-28 user-embeddia user-eu info:eu-repo/semantics/conferencePaper 825153 Cross-Lingual Embeddings for Less-Represented Languages in European News Media 20220329014947.0 592472 md5:a4e514858849c5cfdde83e31b365c240 https://zenodo.org/records/6390721/files/ICML_UDL_2020.pdf open 10.5281/zenodo.6389323 isVersionOf doi