Planned intervention: On Wednesday April 3rd 05:30 UTC Zenodo will be unavailable for up to 2-10 minutes to perform a storage cluster upgrade.
Published December 4, 2020 | Version v1
Conference paper Open

Hybrid Tagger – An Industry-driven Solution for Extreme Multi-label Text Classification

  • 1. Texta OU

Description

This paper presents an industry-driven solution for extreme multi-label classification with a massive label collection. The proposed approach incorporates a large number of binary classification models with label pre-filtering and employs methods and technologies shown to be applicable in industrial scenarios where high-end computational hardware is limited. The system is evaluated on an Estonian newspaper article dataset which contains almost 2000 unique labels and has shown to perform over 80 times faster than applying all the binary models of the entire label set without negative impact on prediction scores.

Files

Vaik_LREC2020.pdf

Files (344.9 kB)

Name Size Download all
md5:d49ea715f606e35ba8177128f0d9ef56
344.9 kB Preview Download

Additional details

Funding

EMBEDDIA – Cross-Lingual Embeddings for Less-Represented Languages in European News Media 825153
European Commission