Published June 1, 2022 | Version v1
Conference paper Open

Multilingual Comparative Analysis of Deep-Learning Dependency Parsing Results Using Parallel Corpora

  • 1. Faculty of Humanities and Social Sciences, University of Zagreb

Description

This article presents a comparative analysis of dependency parsing results for a set of 16 languages, coming from a large variety of linguistic families and genera, whose parallel corpora were used to train a deep-learning tool. Results are analyzed in comparison to an innovative way of classifying languages concerning the head directionality parameter used to perform a quantitative syntactic typological classification of languages. It has been shown that, despite using parallel corpora, there is a large discrepancy in terms of LAS results. The obtained results show that this heterogeneity is mainly due to differences in the syntactic structure of the selected languages, where Indo-European ones, especially Romance languages, have the best scores. It has been observed that the differences in the size of the representation of each language in the language model used by the deep-learning tool also play a major role in the dependency parsing efficacy. Other factors, such as the number of dependency parsing labels may also have an influence on results with more complex labeling systems such as the Polish language.

Files

2022.bucc-1.5.pdf

Files (215.4 kB)

Name Size Download all
md5:d21c2eea6903d0d80724c570db48f645
215.4 kB Preview Download

Additional details

Funding

Cleopatra – Cross-lingual Event-centric Open Analytics Research Academy 812997
European Commission