Published April 29, 2021 | Version v1
Book chapter Open

Cascading collocations: Collocades as correlates of formulaic language

Description

This chapter focuses on a technique for detecting, measuring and displaying traces
of formulaic language. For this purpose, a suite of computational procedures has
been developed in order to quantify the degree to which individual texts and text
types incorporate inflexible sequences of words. This development is predicated on
the assumption that, even if we have no precise definition of formulaic language,
it is widely accepted that it is characterized by repetition of fixed sequences. The
method involves compiling a formulexicon from a corpus of two or more text types
and then using coverage by elements of that formulexicon as an index of the degree
to which a text, possibly absent from the training corpus, is pervaded by formulaic
sequences. The problem of deciding what lengths of n-grams are warranted by the
data is dealt with by the simple expedient of binarizing coverage counts by n-grams
of various lengths. Trials on a variety of text types show that this allows collocades
– cascades of collocations, whose lengths are not pre-determined – to emerge from
the data. Here the term collocation is used in its broader sense, as in “collocations are
co-occurrences of words” (Gries 2009: 14). Software in Python 3 that implements
this approach is available online under a Creative Commons licence. Examples of
applying these procedures to a number of corpora illustrate some of the uses of
this approach.

Files

304-TrkljaGrabowski-2021-2.pdf

Files (474.8 kB)

Name Size Download all
md5:5957f60a46358dc9f60b119603105779
474.8 kB Preview Download

Additional details

Related works

Is part of
978-3-96110-310-2 (ISBN)
10.5281/zenodo.4727623 (DOI)