UPDATE: Zenodo migration postponed to Oct 13 from 06:00-08:00 UTC. Read the announcement.

Conference paper Open Access

A Multilingual Dataset for Named Entity Recognition, Entity Linking and Stance Detection in Historical Newspapers

Hamdi, Ahmed; Linhares Pontes, Elvys; Boros, Emanuela; Tuyet Hai Nguyen, Thi; Hackl, Günter; Moreno, Jose G.; Doucet, Antoine

Named entity processing over historical texts is more and more being used due to the massive documents and archives being stored in digital libraries. However, due to the poor annotated resources of historical nature, information extraction performances fall behind those on contemporary texts. In this paper, we introduce the development of the NewsEye resource, a multilingual dataset for named entity recognition and linking enriched with stances towards named entities. The dataset is comprised of diachronic historical newspaper material published between 1850 and 1950 in French, German, Finnish, and Swedish. Such historical resource is essential in the context of developing and evaluating named entity processing systems. It evenly allows enhancing the forcefulness of existing approaches on historical documents which enable adequate and efficient semantic indexing of historical documents on digital cultural heritage collections

Files (1.3 MB)
Name Size
SIGIR_2021_NewsEye_Resource.pdf
md5:7d6f3e838903d0c11f915761a5ec2056
1.3 MB Download
279
266
views
downloads
All versions This version
Views 279279
Downloads 266266
Data volume 351.2 MB351.2 MB
Unique views 243243
Unique downloads 242242

Share

Cite as