Published December 1, 2021 | Version v1
Conference paper Open

Typological Approach to Improve Dependency Parsing for Croatian Language

  • 1. Faculty of Humanities and Social Sciences, University of Zagreb

Description

This article presents the results of the experiments concerning different typological approaches considering syntactic structures with the aim to identify similar languages which can be combined with Croatian to improve UAS and LAS metrics when using a deep learning tool. From the eight selected languages coming from different linguistic families and genera, we showed that Slovene and Irish are the best candidates which improved significantly dependency parsing results. Slovak is the only language presenting negative synergy when combined with Croatian. Both typological approaches presented in this study, using quantitative data concerning rules from context-free grammar extracted from corpora using Marsagram tool and using syntactic features from lang2vec language vectors, did not allow us to explain the observed synergy when the different languages were combined. The traditional genealogical classification does not explain either the improvement provided by Irish or the negative impact of the Slovak language in both considered metrics.

Files

2021.tlt-1.1.pdf

Files (108.6 kB)

Name Size Download all
md5:0110575d092ec4538669a053b7b9c859
108.6 kB Preview Download

Additional details

Funding

European Commission
Cleopatra – Cross-lingual Event-centric Open Analytics Research Academy 812997