Communicative efficiency and syntactic predictability: A crosslinguistic study based on the Universal Dependencies corpora
Creators
Description
There is ample evidence that human
communication is organized efficiently: more
predictable information is usually encoded by
shorter linguistic forms and less predictable
information is represented by longer forms.
The present study, which is based on the
Universal Dependencies corpora, investigates
if the length of words can be predicted from
the average syntactic information content,
which is defined as the average information
content of a word given its counterpart in a
dyadic syntactic relationship. The effect of
this variable is tested on the data from nine
typologically diverse languages while
controlling for a number of other well-known
parameters: word frequency and average
word predictability based on the preceding
and following words. Poisson generalized
linear models and conditional random forests
show that the words with higher average
syntactic informativity are usually longer in
most languages, although this effect is often
found in interactions with average
information content based on the
neighbouring words. The results of this study
demonstrate that syntactic predictability
should be considered as a separate factor in
future work on communicative efficiency
Files
Levshina_SyntacticPredictability_UD2017.pdf
Files
(529.5 kB)
Name | Size | Download all |
---|---|---|
md5:29216115d11148be7c8121e1c0c0e716
|
529.5 kB | Preview Download |