SloBERTa: Slovene monolingual large pretrained masked language model

doi:10.5281/zenodo.5548034

EMBEDDIA - Cross-Lingual Embeddings for Less-Represented Languages in European News Media

Published October 4, 2021 | Version v1

Conference paper Open

SloBERTa: Slovene monolingual large pretrained masked language model

1. University of Ljubljana, Ljubljana, Slovenia

Large pretrained language models, based on the transformer architecture, show excellent results in solving many natural language processing tasks. The research is mostly focused on English language; however, many monolingual models for other languages have recently been trained. We trained first such monolingual model for Slovene, based on the RoBERTa model. We evaluated the newly trained SloBERTa model on several classification tasks. The results show an improvement over existing multilingual and monolingual models and present current stateof-the-art for Slovene.

Files

Ulcar+Robnik.pdf

Files (396.9 kB)

Name	Size	Download all
Ulcar+Robnik.pdf md5:d63f3fb1a5c4c82b1a8f3574de07cbb7	396.9 kB	Preview Download

Additional details

EMBEDDIA – Cross-Lingual Embeddings for Less-Represented Languages in European News Media 825153: European Commission

Views

Downloads

Show more details

	All versions	This version
Views	56	56
Downloads	35	35
Data volume	13.9 MB	13.9 MB

More info on how stats are collected....

DOI

Resource type

Conference paper

Publisher

Zenodo

Conference

In Proceedings of the 24th International Multiconference – IS2021 (SiKDD)

Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more

Technical metadata

Created: October 4, 2021
Modified: October 5, 2021

SloBERTa: Slovene monolingual large pretrained masked language model

Creators

Description

Files

Ulcar+Robnik.pdf

Files (396.9 kB)

Additional details

Funding