FinEst BERT and CroSloEngual BERT: less is more in multilingual models

doi:10.1007/978-3-030-58323-1_11

EMBEDDIA - Cross-Lingual Embeddings for Less-Represented Languages in European News Media

Published September 1, 2020 | Version v1

Conference paper Open

FinEst BERT and CroSloEngual BERT: less is more in multilingual models

1. University of Ljubljana, Ljubljana, Slovenia

Large pre-trained masked language models have become state-of-the-art solutions for many NLP problems. The research has been mostly focused on English language, though. While massively multilingual models exist, studies have shown that monolingual models produce much better results. We train two trilingual BERT-like models, one for Finnish, Estonian, and English, the other for Croatian, Slovenian, and English. We evaluate their performance on several downstream tasks, NER, POS-tagging, and dependency parsing, using the multilingual BERT and XLM-R as baselines. The newly created FinEst BERT and CroSloEngual BERT improve the results on all tasks in most monolingual and cross-lingual situations.

Files

Ulčar-Robnik-Šikonja2020_Chapter_FinEstBERTAndCroSloEngualBERT.pdf

Files (237.4 kB)

Name	Size	Download all
Ulčar-Robnik-Šikonja2020_Chapter_FinEstBERTAndCroSloEngualBERT.pdf md5:bef281be18b62f4c08faccbf8478e023	237.4 kB	Preview Download

Additional details

EMBEDDIA – Cross-Lingual Embeddings for Less-Represented Languages in European News Media 825153: European Commission

Views

255

Downloads

Show more details

	All versions	This version
Views	23	23
Downloads	255	253
Data volume	61.7 MB	61.3 MB

More info on how stats are collected....

DOI

Resource type

Conference paper

Publisher

Zenodo

Conference

Proceedings of the 23rd International Conference on Text, Speech and Dialogue

Languages

English

Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more

Technical metadata

Created: May 17, 2021
Modified: May 18, 2021

FinEst BERT and CroSloEngual BERT: less is more in multilingual models

Creators

Description

Files

Ulčar-Robnik-Šikonja2020_Chapter_FinEstBERTAndCroSloEngualBERT.pdf

Files (237.4 kB)

Additional details

Funding