Published October 4, 2021 | Version v1
Conference paper Open

SloBERTa: Slovene monolingual large pretrained masked language model

  • 1. University of Ljubljana, Ljubljana, Slovenia

Description

Large pretrained language models, based on the transformer architecture, show excellent results in solving many natural language processing tasks. The research is mostly focused on English language; however, many monolingual models for other languages have recently been trained. We trained first such monolingual model for Slovene, based on the RoBERTa model. We evaluated the newly trained SloBERTa model on several classification tasks. The results show an improvement over existing multilingual and monolingual models and present current stateof-the-art for Slovene.

Files

Ulcar+Robnik.pdf

Files (396.9 kB)

Name Size Download all
md5:d63f3fb1a5c4c82b1a8f3574de07cbb7
396.9 kB Preview Download

Additional details

Funding

EMBEDDIA – Cross-Lingual Embeddings for Less-Represented Languages in European News Media 825153
European Commission