Published September 25, 2019 | Version v1
Journal article Open

LEARNING CROSS-LINGUAL WORD EMBEDDINGS WITH UNIVERSAL CONCEPTS

  • 1. Department of Computer Science, University of Alabama, Tuscaloosa, Alabama

Description

Recent advances in generating monolingual word embeddings based on word co-occurrence for universal languages inspired new efforts to extend the model to support diversified languages. State-of-the-art methods for learning cross-lingual word embeddings rely on the alignment of monolingual word embedding spaces. Our goal is to implement a word co-occurrence across languages with the universal concepts’ method. Such concepts are notions that are fundamental to humankind and are thus persistent across languages, e.g., a man or woman, war or peace, etc. Given bilingual lexicons, we built universal concepts as undirected graphs of connected nodes and then replaced the words belonging to the same graph with a unique graph ID. This intuitive design makes use of universal concepts in monolingual corpora which will help generate meaningful word embeddings across languages via the word cooccurrence concept. Standardized benchmarks demonstrate how this underutilized approach competes SOTA on bilingual word sematic similarity and word similarity relatedness tasks.

Files

10319ijwsc02.pdf

Files (620.8 kB)

Name Size Download all
md5:34ad01d9cbc7238fa0d3d86daf55ae9c
620.8 kB Preview Download