Conference paper Open Access

VMWE discovery: a comparative analysis between Literature and Twitter Corpora

Stamou, Vivian; Xylogianni, Artemis; Malli, Marilena; Takorou, Penny; Markantonatou, Stella

We evaluate manually five lexical association measurements as regards the discovery of Modern Greek verb multiword expressions with two or more lexicalised components using mwetoolkit3 (Ramisch et al., 2010). We use Twitter corpora and compare our findings with previous work on fiction corpora. The results of LL, MLE and T-score were found to overlap significantly in both the fiction and the Twitter corpora, while the results of PMI and Dice do not. We find that MWEs with two lexicalised components are more frequent in Twitter than in fic- tion corpora and that lean syntactic patterns help retrieve them more efficiently than richer ones. Our work (i) supports the enrichment of the lexicographical database for Modern Greek MWEs ’IDION’ (Markantonatou et al., 2019) and (ii) highlights aspects of the usage of five association measurements on specific text genres for best MWE discovery results.

Files (274.4 kB)
Name Size
274.4 kB Download
All versions This version
Views 1010
Downloads 66
Data volume 1.6 MB1.6 MB
Unique views 1010
Unique downloads 66


Cite as