Using Pseudo-Labelled Data for Zero-Shot Text Classification

Wang, Congcong; Nulty, Paul; Lillis, David

doi:10.1007/978-3-031-08473-7_4

Published June 13, 2022 | Version v1

Conference paper Open

Using Pseudo-Labelled Data for Zero-Shot Text Classification

1. University College Dublin
2. Birkbeck, University of London

Existing Zero-Shot Learning (ZSL) techniques for text classification typically assign a label to a piece of text by building a matching model to capture the semantic similarity between the text and the label descriptor. This is expensive at inference time as it requires the text paired with every label to be passed forward through the matching model. The existing approaches to alleviate this issue are based on exact-word matching between the label surface names and an unlabelled target-domain corpus to get pseudo-labelled data for model training, making them difficult to generalise to ZS classification in multiple domains, In this paper, we propose an approach called P-ZSC to leverage pseudo-labelled data for zero-shot text classification. Our approach generates the pseudo-labelled data through a matching algorithm between the unlabelled target-domain corpus and the label vocabularies that consist of in-domain relevant phrases via expansion from label names. By evaluating our approach on several benchmarking datasets from a variety of domains, the results show that our system substantially outperforms the baseline systems especially in datasets whose classes are imbalanced.

Files

Wang2022.pdf

Files (319.7 kB)

Name	Size	Download all
Wang2022.pdf md5:98187d2efb208649e9efed5481f34fc2	319.7 kB	Preview Download

	All versions	This version
Views	56	56
Downloads	128	127
Data volume	41.2 MB	40.9 MB

Using Pseudo-Labelled Data for Zero-Shot Text Classification

Authors/Creators

Description

Files

Wang2022.pdf

Files (319.7 kB)