Using a Frustratingly Easy Domain and Tagset Adaptation for Creating Slavic Named Entity Recognition Systems

doi:10.5281/zenodo.4730478

Published April 30, 2021 | Version v1

Conference paper Open

Using a Frustratingly Easy Domain and Tagset Adaptation for Creating Slavic Named Entity Recognition Systems

1. University of La Rochelle, L3i, F-17000, La Rochelle, France
2. University of Toulouse - IRIT

We present a collection of Named Entity Recognition (NER) systems for six Slavic languages: Bulgarian, Czech, Polish, Slovenian, Russian and Ukrainian. These NER systems have been trained using different BERT models and a Frustratingly Easy Domain Adaptation (FEDA). FEDA allow us creating NER systems using multiple datasets without having to worry about whether the tagset (e.g. Location, Event, Miscellaneous, Time) in the source and target domains match, while increasing the amount of data available for training. Moreover, we boosted the prediction on named entities by marking uppercase words and predicting masked words. Participating in the 3rd Shared Task on SlavNER1 , our NER systems reached a strict micro F-score of up to 0.908. The results demonstrate good generalization, even in named entities with weak regularity, such as book titles, or entities

Files

2021.bsnlp-1.12.pdf

Files (336.3 kB)

Name	Size	Download all
2021.bsnlp-1.12.pdf md5:3c05fbb7ede7c73392b747b5f94aec49	336.3 kB	Preview Download

Additional details

NewsEye – NewsEye: A Digital Investigator for Historical Newspapers 770299: European Commission
EMBEDDIA – Cross-Lingual Embeddings for Less-Represented Languages in European News Media 825153: European Commission

Views

Downloads

Show more details

	All versions	This version
Views	60	60
Downloads	45	45
Data volume	15.5 MB	15.5 MB

More info on how stats are collected....

DOI

Resource type

Conference paper

Publisher

Zenodo

Conference

In Proceedings of the 8th Workshop on Balto-Slavic Natural Language Processing in conjunction to EACL2021 (BSNLP@EACL2021), 20 April 2021

Languages

English

Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more

Technical metadata

Created: April 30, 2021
Modified: July 19, 2024

Using a Frustratingly Easy Domain and Tagset Adaptation for Creating Slavic Named Entity Recognition Systems

Creators

Description

Files

2021.bsnlp-1.12.pdf

Files (336.3 kB)

Additional details

Funding