Published August 12, 2024 | Version v1
Dataset Open

Named entity recognition (NER) on historical DBNL data

  • 1. ROR icon National Library of the Netherlands

Description

The dataset contains the public domain part of DBNL (3542
publications) annotated with 3 NER models. The models were fine-
tuned on Dutch historical data and described in this paper:
https://aclanthology.org/2024.lt4hala-1.4/ . The dataset consists of 4
parts: the predictions of each of the 3 NER models, and the
predictions obtained by combining the models in majority voting. It can
be used as silver data for training entity linking models, or in digital
humanities applications. The annotations are available as tsv files in
the standard IOB format (each token is labelled as beginning, inside
or outside of a named entity).
Acknowledgements: support with the data and research was provided
by Marieke Moolenaar.

Files

Files (2.5 GB)

Name Size Download all
md5:9e6592cd58696f20d566467b9d789fea
2.5 GB Download