Published September 6, 2021 | Version v1
Conference paper Open

Cracking a Walnut with a Sledgehammer: XLM-RoBERTa for German Verbal Idiom Disambiguation Tasks

  • 1. Gottingen Centre for Digital Humanities, Georg-August-Universität Göttingen

Description

This paper describes the efforts in solving the Shared Task on the Disambiguation of German Verbal Idioms at KONVENS 2021. It presents the team's efforts to extend the training data semi-automatically. The disambigua- tion task was solved using XLM-RoBERTa, which delivered the best results with 0.76 f1- Score on all tested non-idiomatic instances in the test set. The baseline model, a linear SVM, achieves 0.55 f1-Score. Furthermore, additional data was collected to enhance the training data set with respect to literal use of idiomatic expressions. While the baseline model improves slightly with additional training data, the XLM-RoBERTa model performs better when only the core training data is provided.

Files

KONVENS_2021_Disambiguation_ST-XLM-RoBERTa_for_German_Verbal_Idiom_Disambiguation_Tasks.pdf

Additional details