Cross-lingual Sentence Embedding using Multi-Task Learning

doi:10.5281/zenodo.5772038

Published November 7, 2021 | Version v1

Conference paper Open

Cross-lingual Sentence Embedding using Multi-Task Learning

1. National University of Ireland Galway
2. Huawei

The scarcity of labeled training data across many languages is a significant roadblock for multilingual neural language processing. We approach the lack of in-language training data using sentence embeddings that map text written in different lan- guages, but with similar meanings, to nearby embedding space representations. The representations are produced using a dual-encoder based model trained to maximize the representational similarity between sentence pairs drawn from parallel data. The representations are enhanced using multitask training and unsupervised monolingual corpora. The effectiveness of our multilingual sentence embeddings are assessed on a comprehensive collection of monolingual, cross-lingual, and zero- shot/few-shot learning tasks.

Files

goswami2021crosslingual.pdf

Files (347.1 kB)

Name	Size	Download all
goswami2021crosslingual.pdf md5:6a491bc3896ea61b110e9670237a29c8	347.1 kB	Preview Download

Views

Downloads

Show more details

	All versions	This version
Views	51	51
Downloads	48	48
Data volume	16.7 MB	16.7 MB

More info on how stats are collected....

DOI

Resource type

Conference paper

Publisher

Zenodo

Conference

EMNLP 2021

Languages

English

Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more

Technical metadata

Created: December 10, 2021
Modified: July 17, 2024

Cross-lingual Sentence Embedding using Multi-Task Learning

Creators

Description

Files

goswami2021crosslingual.pdf

Files (347.1 kB)