Flexibility of multiword expressions and Corpus Pattern Analysis
Description
This chapter is set in the context of Corpus Pattern Analysis (CPA), a technique
developed by Patrick Hanks to map meaning onto word patterns found in corpora.
The main output of CPA is the Pattern Dictionary of English Verbs (PDEV), cur-
rently describing patterns for over 1,600 verbs, many of which are acknowledged to
be multiword expressions (MWEs) such as phrasal verbs or idioms. PDEV entries
are manually produced by lexicographers, based on the analysis of a substantial
sample of concordance lines from the corpus, so the construction of the resource
is very time-consuming. The motivation for the work presented in this chapter is
to speed up the discovery of these word patterns, using methods which can be
transferred to other languages. This chapter explores the benefits of a detailed con-
trastive analysis of MWEs found in English and French corpora with a view on
English-French translation. The comparative analysis is conducted through a case
study of the pair (bite, mordre), to illustrate both CPA and the application of sta-
tistical measures for the automatic extraction of MWEs. The approach taken in
this chapter takes its point of departure from the use of statistics developed ini-
tially by Church & Hanks (1989). Here we look at statistical measures which have
not yet been tested for their ability to discover new collocates, but are useful for
characterizing verbal MWEs already found. In particular we propose measures to
characterize the mean span, rigidity, diversity, and idiomaticity of a given MWE.
Files
4.pdf
Files
(243.0 kB)
Name | Size | Download all |
---|---|---|
md5:3a9b15c519d2ec4b73259d80a7a22ecc
|
243.0 kB | Preview Download |