Published June 13, 2022 | Version v1
Conference paper Open

Using Pseudo-Labelled Data for Zero-Shot Text Classification

  • 1. University College Dublin
  • 2. Birkbeck, University of London

Description

Existing Zero-Shot Learning (ZSL) techniques for text classification typically assign a label to a piece of text by building a matching model to capture the semantic similarity between the text and the label descriptor. This is expensive at inference time as it requires the text paired with every label to be passed forward through the matching model. The existing approaches to alleviate this issue are based on exact-word matching between the label surface names and an unlabelled target-domain corpus to get pseudo-labelled data for model training, making them difficult to generalise to ZS classification in multiple domains, In this paper, we propose an approach called P-ZSC to leverage pseudo-labelled data for zero-shot text classification. Our approach generates the pseudo-labelled data through a matching algorithm between the unlabelled target-domain corpus and the label vocabularies that consist of in-domain relevant phrases via expansion from label names. By evaluating our approach on several benchmarking datasets from a variety of domains, the results show that our system substantially outperforms the baseline systems especially in datasets whose classes are imbalanced.

Files

Wang2022.pdf

Files (319.7 kB)

Name Size Download all
md5:98187d2efb208649e9efed5481f34fc2
319.7 kB Preview Download