Analysis of Corpus-based Word-Order Typological Methods
- 1. Faculty of Humanities and Social Sciences, University of Zagreb
- 2. Faculty of Mathematics and Physics, Charles University
Description
This article presents a comparative analysis of four different syntactic typological approaches applied to 20 different languages. We compared three specific quantitative methods, using parallel CoNLL-U corpora, to the classification obtained via syntactic features provided by a typological database (lang2vec). First, we analyzed the Marsagram linear approach which consists of extracting the frequency word-order patterns regarding the position of components inside syntactic nodes. The second approach considers the relative position of heads and dependents, and the third is based simply on the relative position of verbs and objects. From the results, it was possible to observe that each method provides different language clusters which can be compared to the classic genealogical classification (the lang2vec and the head and dependent methods being the closest). As different word-order phenomena are considered in these specific typological strategies, each one provides a different angle of analysis to be applied according to the precise needs of the researchers
Files
2023.udw-1.5.pdf
Files
(234.6 kB)
Name | Size | Download all |
---|---|---|
md5:db809058581fabaf78fd0de9c001a350
|
234.6 kB | Preview Download |