Vaik, Kristiina
Asula, Marit
Sirel, Raul
2020-12-04
<p>This paper presents an industry-driven solution for extreme multi-label classification with a massive label collection. The proposed approach incorporates a large number of binary classification models with label pre-filtering and employs methods and technologies shown to be applicable in industrial scenarios where high-end computational hardware is limited. The system is evaluated on an Estonian newspaper article dataset which contains almost 2000 unique labels and has shown to perform over 80 times faster than applying all the binary models of the entire label set without negative impact on prediction scores.</p>
https://doi.org/10.5281/zenodo.4306169
oai:zenodo.org:4306169
Zenodo
https://zenodo.org/communities/embeddia
https://zenodo.org/communities/eu
https://doi.org/10.5281/zenodo.4306168
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
LREC2020, In Proceedings of the LREC2020 Industry Track, Marseille, France, 11-16 May
text classification
extreme multi-label classification
data processing workflows
Hybrid Tagger – An Industry-driven Solution for Extreme Multi-label Text Classification
info:eu-repo/semantics/conferencePaper