Reusing the Model and Components of an IIR Study for Perceived Effects of OCR Quality Change
- 1. University of Eastern Finland
- 2. Tampere University
- 3. Aalborg University
- 4. National Library of Finland
Description
ABSTRACT
Historical newspapers are increasingly accessed digitally for different purposes both by professional and lay users. These ever-growing historical collections are usually formed by utilizing Optical Character Recognition (OCR), which may introduce noise to the texts. This subsequently leads to compromised information retrieval (IR) performance and user understanding. The effect of OCR noise on IR performance has been studied earlier by utilizing artificially degraded OCR quality texts (see, e.g., [2, 15]), test collection containing documents with authentic low OCR quality [12], or by gathering end-user impressions [23]. However, it remains challenging to measure how the user’s subjective perception is affected by the amount of OCR noise remaining in the documents. Recently, the National Library of Finland has set up an experimental system which allows studying this issue. The system allows presenting each underlying historical document as two alternatives – either based on the baseline OCR quality, or on the new, improved OCR quality. This set up facilitates studying the effects of OCR quality changes on the user’s subjective perception of the document.
Following Gäde et al. [8] we describe in this paper the research design, infrastructure, and research data utilized in a recent user experiment of Kettunen et al. [19] entailing thirty-two test subjects performing simulated work tasks [4] and discuss the prospects of reuse of the experimental components of the study. So far, the system has been used in one experiment in which the subjects performed simulated tasks. However, the research design and its general model could be utilized in the future to study the effects of OCR quality on professional settings entailing historians performing naturalistic phases of their research tasks.
****************************************************************************************************************************************************
Third Workshop on Building towards Information Interaction and Retrieval Resources Re-use
Files
reuse_BIIR.pdf
Files
(386.7 kB)
Name | Size | Download all |
---|---|---|
md5:8b164e8cebfa29fa027551deb91a9849
|
386.7 kB | Preview Download |