Heterogeneous Document Embeddings for Cross-Lingual Text Classification

Moreo, Alejandro; Pedrotti, Andrea; Sebastiani, Fabrizio

doi:10.1145/3412841.3442093

Published March 22, 2021 | Version v1

Conference paper Open

Heterogeneous Document Embeddings for Cross-Lingual Text Classification

1. Italian National Council of Research

Funnelling (Fun) is a method for cross-lingual text classification (CLC) based on a two-tier ensemble for heterogeneous transfer learning. In Fun, 1st-tier classifiers, each working on a different, language-dependent feature space, return a vector of calibrated posterior probabilities (with one dimension for each class) for each document, and the final classification decision is taken by a meta- classifier that uses this vector as its input. The metaclassifier can thus exploit class-class correlations, and this (among other things) gives Fun an edge over CLC systems where these correlations cannot be leveraged.

We here describe Generalized Funnelling (gFun), a learning ensemble where the metaclassifier receives as input the above vector of calibrated posterior probabilities, concatenated with document embeddings (aligned across languages) that embody other types of correlations, such as word-class correlations (as encoded by Word-Class Embeddings) and word-word correlations (as encoded by Multilingual Unsupervised or Supervised Embeddings). We show that gFun improves on Fun by describing experiments on two large, standard multilingual datasets for multi-label text classification.

Files

SAC2021.pdf

Files (747.1 kB)

Name	Size	Download all
SAC2021.pdf md5:c44ae1c6604c0fe1ae0cb34ad965d9a2	747.1 kB	Preview Download

Additional details

European Commission
ARIADNEplus – Advanced Research Infrastructure for Archaeological Data Networking in Europe - plus 823914

	All versions	This version
Views	351	351
Downloads	243	243
Data volume	189.0 MB	189.0 MB

Heterogeneous Document Embeddings for Cross-Lingual Text Classification

Creators

Description

Files

SAC2021.pdf

Files (747.1 kB)

Additional details

Funding