An evaluation of BERT and Doc2Vec model on the IPTC Subject Codes prediction dataset

Pranjić, Marko; Robnik-Šikonja, Marko; Pollak, Senja

doi:10.5281/zenodo.5548024

Published October 4, 2021 | Version v1

Conference paper Open

An evaluation of BERT and Doc2Vec model on the IPTC Subject Codes prediction dataset

1. TRIKODER DOO, Zagreb
2. University of Ljubljana, Ljubljana, Slovenia
3. Jožef Stefan Institute

Large pretrained language models like BERT have shown excellent generalization properties and have advanced the state of the art on various NLP tasks. In this paper we evaluate Finnish BERT (FinBERT) model on the IPTC Subject Codes prediction task. We compare it to a simpler Doc2Vec model used as a baseline. Due to hierarchical nature of IPTC Subject Codes, we also evaluate the effect of encoding the hierarchy in the network layer topology. Contrary to our expectations, a simpler baseline Doc2Vec model clearly outperforms the more complex FinBERT model and our attempts to encode hierarchy in a prediction network do not yield systematic improvement.

Files

Pranjicetal.pdf

Files (422.6 kB)

Name	Size	Download all
Pranjicetal.pdf md5:7345af9f6c4af4ac18e49ebc9aa7b3c0	422.6 kB	Preview Download

Additional details

European Commission
EMBEDDIA - Cross-Lingual Embeddings for Less-Represented Languages in European News Media 825153

218

Views

108

Downloads

Show more details

	All versions	This version
Views	218	218
Downloads	108	108
Data volume	46.1 MB	46.1 MB

More info on how stats are collected....

DOI

Resource type

Conference paper

Publisher

Zenodo

Conference

In Proceedings of the 24th International Multiconference – IS2021 (SiKDD)

Languages

English

License: Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more

Technical metadata

Created: October 4, 2021
Modified: July 17, 2024

An evaluation of BERT and Doc2Vec model on the IPTC Subject Codes prediction dataset

Authors/Creators

Description

Files

Pranjicetal.pdf

Files (422.6 kB)

Additional details

Funding