Published December 1, 2022 | Version v1
Conference paper Open

Analysis of Corpus-based Word-Order Typological Methods

  • 1. Faculty of Humanities and Social Sciences, University of Zagreb
  • 2. Faculty of Mathematics and Physics, Charles University

Description

This article presents a comparative analysis of four different syntactic typological approaches applied to 20 different languages. We compared three specific quantitative methods, using parallel CoNLL-U corpora, to the classification obtained via syntactic features provided by a typological database (lang2vec). First, we analyzed the Marsagram linear approach which consists of extracting the frequency word-order patterns regarding the position of components inside syntactic nodes. The second approach considers the relative position of heads and dependents, and the third is based simply on the relative position of verbs and objects. From the results, it was possible to observe that each method provides different language clusters which can be compared to the classic genealogical classification (the lang2vec and the head and dependent methods being the closest). As different word-order phenomena are considered in these specific typological strategies, each one provides a different angle of analysis to be applied according to the precise needs of the researchers

Files

2023.udw-1.5.pdf

Files (234.6 kB)

Name Size Download all
md5:db809058581fabaf78fd0de9c001a350
234.6 kB Preview Download

Additional details

Funding

European Commission
Cleopatra – Cross-lingual Event-centric Open Analytics Research Academy 812997