Published May 22, 2017 | Version v1
Conference paper Open

Communicative efficiency and syntactic predictability: A crosslinguistic study based on the Universal Dependencies corpora

Description

There is ample evidence that human
communication is organized efficiently: more
predictable information is usually encoded by
shorter linguistic forms and less predictable
information is represented by longer forms.
The present study, which is based on the
Universal Dependencies corpora, investigates
if the length of words can be predicted from
the average syntactic information content,
which is defined as the average information
content of a word given its counterpart in a
dyadic syntactic relationship. The effect of
this variable is tested on the data from nine
typologically diverse languages while
controlling for a number of other well-known
parameters: word frequency and average
word predictability based on the preceding
and following words. Poisson generalized
linear models and conditional random forests
show that the words with higher average
syntactic informativity are usually longer in
most languages, although this effect is often
found in interactions with average
information content based on the
neighbouring words. The results of this study
demonstrate that syntactic predictability
should be considered as a separate factor in
future work on communicative efficiency

Files

Levshina_SyntacticPredictability_UD2017.pdf

Files (529.5 kB)

Name Size Download all
md5:29216115d11148be7c8121e1c0c0e716
529.5 kB Preview Download

Additional details

Funding

FormGram – Form-frequency correspondences in grammar 670985
European Commission