Sequence models and lexical resources for MWE identification in French

doi:10.5281/zenodo.1469567

Published October 23, 2018 | Version v1

Book chapter Open

Sequence models and lexical resources for MWE identification in French

1. Aix Marseille Univ, Université de Toulon, CNRS, LIS, Marseille, France

We present a simple and efficient sequence tagger capable of identifying continuous multiword expressions (MWEs) of several categories in French texts. It is based on conditional random fields (CRF), using as features local context information such as previous and next word lemmas and parts of speech. We show that this approach can obtain results that, in some cases, approach more sophisticated parser-based MWE identification methods without requiring syntactic trees from a treebank. Moreover, we study how well the CRF can take into account external information coming from both high-quality hand-crafted lexicons and MWE lists automatically obtained from large monolingual corpora. Results indicate that external information systematically helps improving the tagger's performance, compensating for the limited amount of training data.

Files

10.pdf

Files (292.6 kB)

Name	Size	Download all
10.pdf md5:38c4a7c490cb354fd530575f126f5f34	292.6 kB	Preview Download

Views

Downloads

Show more details

	All versions	This version
Views	60	60
Downloads	35	35
Data volume	10.5 MB	10.5 MB

More info on how stats are collected....

DOI

Resource type

Book chapter

Publisher

Language Science Press

Imprint

Multiword expressions at length and in depth, 263-297. Berlin. ISBN: 978-3-96110-123-8.

Languages

English

Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more

Technical metadata

Created: November 21, 2018
Modified: August 2, 2024

Sequence models and lexical resources for MWE identification in French

Creators

Description

Files

10.pdf

Files (292.6 kB)