Cabrera-Diego, Luis Adrián
Moreno, Jose G.
Doucet, Antoine
2021-04-30
<p>We present a collection of Named Entity Recognition (NER) systems for six Slavic languages: Bulgarian, Czech, Polish, Slovenian, Russian and Ukrainian. These NER systems have been trained using different BERT models and a Frustratingly Easy Domain Adaptation (FEDA). FEDA allow us creating NER systems using multiple datasets without having to worry about whether the tagset (e.g. Location, Event, Miscellaneous, Time) in the source and target domains match, while increasing the amount of data available for training. Moreover, we boosted the prediction on named entities by marking uppercase words and predicting masked words. Participating in the 3rd Shared Task on SlavNER1 , our NER systems reached a strict micro F-score of up to 0.908. The results demonstrate good generalization, even in named entities with weak regularity, such as book titles, or entities</p>
https://doi.org/10.5281/zenodo.4730478
oai:zenodo.org:4730478
eng
Zenodo
https://zenodo.org/communities/newseye
https://zenodo.org/communities/embeddia
https://zenodo.org/communities/eu
https://doi.org/10.5281/zenodo.4730477
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
BSNLP@EACL2021, In Proceedings of the 8th Workshop on Balto-Slavic Natural Language Processing in conjunction to EACL2021, 20 April 2021
Using a Frustratingly Easy Domain and Tagset Adaptation for Creating Slavic Named Entity Recognition Systems
info:eu-repo/semantics/conferencePaper