TermEnsembler: An enseble learning approach to bilingual term extraction and alignment

doi:10.1075/term.00029.rep

Published July 24, 2019 | Version v1

Journal article Open

TermEnsembler: An enseble learning approach to bilingual term extraction and alignment

1. Jožef Stefan Institute

This paper describes TermEnsembler, a bilingual term extraction and alignment system utilizing a novel ensemble learning approach to bilingual term alignment. In the proposed system, the processing starts with monolingual term extraction from a language industry standard file type containing aligned English and Slovenian texts. The two separate term lists are then automatically aligned using an ensemble of seven bilingual alignment methods, which are first executed separately and then merged using the weights learned with an evolutionary algorithm. In the experiments, the weights were learned on one domain and tested on two other domains. When evaluated on the top 400 aligned term pairs, the precision of term alignment is over 96%, while the number of correctly aligned multi-word unit terms exceeds 30% when evaluated on the top 400 term pairs.

Files

Repar2019_TermEnsembler.pdf

Files (652.7 kB)

Name	Size	Download all
Repar2019_TermEnsembler.pdf md5:6f2b1e746807e2879888643d64628ee2	652.7 kB	Preview Download

Additional details

EMBEDDIA – Cross-Lingual Embeddings for Less-Represented Languages in European News Media 825153: European Commission

	All versions	This version
Views	30	30
Downloads	54	54
Data volume	35.9 MB	35.9 MB

TermEnsembler: An enseble learning approach to bilingual term extraction and alignment

Creators

Description

Files

Repar2019_TermEnsembler.pdf

Files (652.7 kB)

Additional details

Funding