Word-Class Embeddings for Multiclass Text Classification

Moreo, Alejandro; Esuli, Andrea; Sebastiani, Fabrizio

doi:10.5281/zenodo.4468313

Published December 31, 2020 | Version v1

Journal article Open

Word-Class Embeddings for Multiclass Text Classification

1. Italian National Council of Research

Pre-trained word embeddings encode general word semantics and lexical regularities of natural language, and have proven useful across many NLP tasks, including word sense disambiguation, machine translation, and sentiment analysis, to name a few. In supervised tasks such as multiclass text classification (the focus of this article) it seems appealing to enhance word representations with ad-hoc embeddings that encode task-specific information. We propose (supervised) word-class embeddings (WCEs), and show that, when concatenated to (unsupervised) pre-trained word embeddings, they substantially facilitate the training of deep-learning models in multiclass classification by topic. We show empirical evidence that WCEs yield a consistent improvement in multiclass classification accuracy, using six popular neural architectures and six widely used and publicly available datasets for multi- class text classification. One further advantage of this method is that it is conceptually simple and straightforward to implement. Our code that implements WCEs is publicly available at https://github.com/AlexMoreo/ word-class-embeddings.

Files

main.pdf

Files (9.8 MB)

Name	Size	Download all
main.pdf md5:d40104cddde510c530d3320fbc6a915e	9.8 MB	Preview Download

	All versions	This version
Views	224	224
Downloads	146	146
Data volume	1.5 GB	1.5 GB

Word-Class Embeddings for Multiclass Text Classification

Creators

Description

Files

main.pdf

Files (9.8 MB)