Conference paper Open Access
Digitized newspaper collections facilitate the access to historical newspapers. Even though they offer several useful possibilities regarding the research in historical newspapers and magazines, the (automatic) research in these collections is (still) full of limitations and pitfalls. Based on the research conducted on the platform AustriaN Newspapers Online (ANNO) for the NewsEye case study ‘the dark sides of war’, the main challenges of working with digitized newspaper collections will be discussed in this paper. Especially two aspects – the fire catastrophe at the munitions factory Wöllersdorf (1918/09/18) in Lower Austria and the Austrian press coverage about war widows during the First World War – will be used as specific examples. The discussed limitations include the Optical Character Recognition (OCR) quality, provided search options and metadata, as well as others. Furthermore, possible improvements regarding these challenges, e.g. Optical Layout Recognition (OLR), Named-entity Recognition (NER) and Named-entity Linking (NEL), will be presented in this paper.
Can Umlauts Ruin Your Research in Digitized Newspaper Collections
|All versions||This version|
|Data volume||25.8 MB||25.8 MB|