Modelling multiword expressions in a parallel Bulgarian-English newsmedia corpus

doi:10.5281/zenodo.1182603

Language Science Press

Published February 21, 2018 | Version v1

Book chapter Open

Modelling multiword expressions in a parallel Bulgarian-English newsmedia corpus

The paper focuses on the modelling of multiword expressions (MWE) in Bulgarian-
English parallel news corpora (SETimes; CSLI dataset and PennTreebank dataset).
Observations were made on alignments in which at least one multiword expression
was used per language. The multiword expressions were classified with respect to
the PARSEME lexicon-based (WG1) and treebank-based (WG4) classifications. The
non-MWE counterparts of MWEs are also considered. Our approach is data-driven
because the data of this study was retrieved from parallel corpora and not from
bilingual dictionaries. The survey shows that the predominant translation relation
between Bulgarian and English is MWE-to-word, and that this relation does not
exclude other translation options. To formalize our observations, a catenae-based
modelling of the parallel pairs is proposed.

Files

9.pdf

Files (233.3 kB)

Name	Size	Download all
9.pdf md5:e3a3f0ded36a4bda12b060378b51784b	233.3 kB	Preview Download

Views

Downloads

Show more details

	All versions	This version
Views	62	58
Downloads	34	34
Data volume	8.2 MB	8.2 MB

More info on how stats are collected....

DOI

Resource type

Book chapter

Publisher

Language Science Press

Imprint

Multiword expressions: Insights from a multi-lingual perspective, 247-269. Berlin. ISBN: 978-3-96110-063-7.

Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more

Technical metadata

Created: May 22, 2018
Modified: August 2, 2024

Modelling multiword expressions in a parallel Bulgarian-English newsmedia corpus

Creators

Description

Files

9.pdf

Files (233.3 kB)