Published June 29, 2021 | Version v1
Book chapter Open

Historical changes in semantic weights of sub-word units

  • 1. San Diego State University

Description

In this chapter, we present a computational study on how the weight of sub-word
units in determining word meanings evolves chronologically in different languages.
Sub-word units, e.g., morphemes, syllables etc., play variable roles in
determining word meanings. Some morphemes in English have standalone lexical
meanings (e.g., the root) while others function more as morpho-syntactic markers (e.g.,
the bound morphemes such as -ness etc.) The semantic weight of sub-word units
changes over time; for instance, some ancient characters in Chinese or ancient
prefixes in English no longer carry clear semantic meanings. The goal of this chapter
is to characterize such a change with computational methods. The semantic weight
of sub-word units can be captured by word embedding models (and their variants).
We present results from two substudies. In Study 1, we propose a novel neural
network-based word embedding model to model the semantic weights from
sub-word units. We draw a comparison between Chinese and Indo-European languages
in how the semantic weights of sub-words change over time, and show that the
weights of characters in Chinese (字 zi, the basic sub-word unit in Chinese) are
higher in ancient Chinese and lower in modern Chinese, while the opposite trend
is observed in Indo-European languages. This is in accordance with theories about
monosyllabic-to-bisyllabic shift in Chinese, and the synthetic-to-analytic shift
conjecture in Indo-European languages. In Study 2, we apply a different embedding
model on another corpus to confirm the finding in Study 1. Although the
chronological pattern of semantic weight found is inconsistent with that in Study 1, the
results are still meaningful in having discovered the presence of historical changes
of sub-word level semantic weights across different corpora and languages.
Our chapter calls for more systematic studies of the applicability of computational
embedding methods in modeling the sub-word semantics. Although discrepancies
are found in current models and corpora, our empirical findings suggest that word
level semantic composition is a dynamic process which reflects historical changes.

 

Files

303-TahmasebiEtAl-2021-5.pdf

Files (442.6 kB)

Name Size Download all
md5:44d7034135047f60ca0b6b76d93f530d
442.6 kB Preview Download

Additional details

Related works

Is part of
978-3-96110-312-6 (ISBN)
10.5281/zenodo.5040241 (DOI)