Published November 7, 2025 | Version v1
Conference paper Open

Enhancing YOLO Models for Handwritten Text Recognition

Authors/Creators

  • 1. HSE University

Description

This paper is devoted to the development of computer vision models capable of solving problems of simultaneous detection and recognition of handwritten text. As a starting point, the YOLOv8 family of architectures for object detection is considered. We formulate line detection as different tasks depending on the text shape: straight lines as object detection, slanted lines as oriented bounding box (OBB) detection, and curved lines as instance segmentation. For each version of the model, a suitable pooling procedure is developed that extracts a feature description within a bounding box or a mask. For the instance segmentation problem, a modification of the segmentation mechanism is proposed that takes into account the features of lines as graphic objects and operates on a geometric principle. To recognize the formatting of handwritten text, in particular, to determine strikethrough and underlining, a transition to an extended alphabet is carried out with the prediction of two components—a symbol and its style—separately. The effectiveness of the developed methods is estimated on real original data—a set of diary pages of the Russian statesman Modest Andreevich Korf (1800-1876), which is a valuable historical source. All models successfully cope with the task and demonstrate a character error rate (CER) of about 3-4%, which makes the recognized text easily readable by a person. At the same time, the quality of recognition increases with increasing complexity of the model, which justifies the consideration of various variants of the problem. The code is available at https://github.com/nlomov/yolo-htr.

Files

Lom.pdf

Files (5.0 MB)

Name Size Download all
md5:3e4e05a127c2595780ef2f0862d23064
5.0 MB Preview Download