Conference paper Open Access

Can Umlauts Ruin Your Research in Digitized Newspaper Collections? A NewsEye Case Study on 'The Dark Sides of War' (1914–1918)

Klaus, Barbara

Digitized newspaper collections facilitate the access to historical newspapers. Even though they offer several useful possibilities regarding the research in historical newspapers and magazines, the (automatic) research in these collections is (still) full of limitations and pitfalls. Based on the research conducted on the platform AustriaN Newspapers Online (ANNO) for the NewsEye case study ‘the dark sides of war’, the main challenges of working with digitized newspaper collections will be discussed in this paper. Especially two aspects – the fire catastrophe at the munitions factory Wöllersdorf (1918/09/18) in Lower Austria and the Austrian press coverage about war widows during the First World War – will be used as specific examples. The discussed limitations include the Optical Character Recognition (OCR) quality, provided search options and metadata, as well as others. Furthermore, possible improvements regarding these challenges, e.g. Optical Layout Recognition (OLR), Named-entity Recognition (NER) and Named-entity Linking (NEL), will be presented in this paper.

Files (2.0 MB)
Name Size
Can Umlauts Ruin Your Research in Digitized Newspaper Collections
2.0 MB Download
All versions This version
Views 4848
Downloads 1313
Data volume 25.8 MB25.8 MB
Unique views 4242
Unique downloads 1313


Cite as