Published May 3, 2022 | Version v1
Conference paper Open

Reusing the Model and Components of an IIR Study for Perceived Effects of OCR Quality Change

  • 1. University of Eastern Finland
  • 2. Tampere University
  • 3. Aalborg University
  • 4. National Library of Finland

Description

ABSTRACT

Historical newspapers are increasingly accessed digitally for different purposes both by professional and lay users. These ever-growing historical collections are usually formed by utilizing Optical Character Recognition (OCR), which may introduce noise to the texts. This subsequently leads to compromised information retrieval (IR) performance and user understanding. The effect of OCR noise on IR performance has been studied earlier by utilizing artificially degraded OCR quality texts (see, e.g., [2, 15]), test collection containing documents with authentic low OCR quality [12], or by gathering end-user impressions [23]. However, it remains challenging to measure how the user’s subjective perception is affected by the amount of OCR noise remaining in the documents. Recently, the National Library of Finland has set up an experimental system which allows studying this issue. The system allows presenting each underlying historical document as two alternatives – either based on the baseline OCR quality, or on the new, improved OCR quality. This set up facilitates studying the effects of OCR quality changes on the user’s subjective perception of the document.

Following Gäde et al. [8] we describe in this paper the research design, infrastructure, and research data utilized in a recent user experiment of Kettunen et al. [19] entailing thirty-two test subjects performing simulated work tasks [4] and discuss the prospects of reuse of the experimental components of the study. So far, the system has been used in one experiment in which the subjects performed simulated tasks. However, the research design and its general model could be utilized in the future to study the effects of OCR quality on professional settings entailing historians performing naturalistic phases of their research tasks.

****************************************************************************************************************************************************

BIIRRR 2022

Third Workshop on Building towards Information Interaction and Retrieval Resources Re-use

Files

reuse_BIIR.pdf

Files (386.7 kB)

Name Size Download all
md5:8b164e8cebfa29fa027551deb91a9849
386.7 kB Preview Download

Additional details

Funding

NewsEye – NewsEye: A Digital Investigator for Historical Newspapers 770299
European Commission