Published October 23, 2018 | Version v1
Book chapter Open

Sequence models and lexical resources for MWE identification in French

  • 1. Aix Marseille Univ, Université de Toulon, CNRS, LIS, Marseille, France

Description

We present a simple and efficient sequence tagger capable of identifying continuous multiword expressions (MWEs) of several categories in French texts. It is based on conditional random fields (CRF), using as features local context information such as previous and next word lemmas and parts of speech. We show that this approach can obtain results that, in some cases, approach more sophisticated parser-based MWE identification methods without requiring syntactic trees from a treebank. Moreover, we study how well the CRF can take into account external information coming from both high-quality hand-crafted lexicons and MWE lists automatically obtained from large monolingual corpora. Results indicate that external information systematically helps improving the tagger's performance, compensating for the limited amount of training data.

Files

10.pdf

Files (292.6 kB)

Name Size Download all
md5:38c4a7c490cb354fd530575f126f5f34
292.6 kB Preview Download