Info: Zenodo’s user support line is staffed on regular business days between Dec 23 and Jan 5. Response times may be slightly longer than normal.

Published April 25, 2007 | Version 13525
Journal article Open

A Study of Touching Characters in Degraded Gurmukhi Text

Description

Character segmentation is an important preprocessing step for text recognition. In degraded documents, existence of touching characters decreases recognition rate drastically, for any optical character recognition (OCR) system. In this paper a study of touching Gurmukhi characters is carried out and these characters have been divided into various categories after a careful analysis.Structural properties of the Gurmukhi characters are used for defining the categories. New algorithms have been proposed to segment the touching characters in middle zone. These algorithms have shown a reasonable improvement in segmenting the touching characters in degraded Gurmukhi script. The algorithms proposed in this paper are applicable only to machine printed text.

Files

13525.pdf

Files (240.1 kB)

Name Size Download all
md5:fb7b4507c066302e54ec35cd79ab79b5
240.1 kB Preview Download

Additional details

References

  • Y. Lu, "Machine Printed Character Segmentation - an Overview", Pattern Recognition, vol. 29, no. 1, pp. 67-80, 1995
  • S.Kahan, T.Pavlidis, and H.S.Baird, " on the recognition of printed characters of any fonts and sizes", IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 9, no. 2, pp. 274-288, Mar. 1987
  • S. Liang, M. Sridhar and M. Ahmadi, "Segmentation of Touching Characters in Printed Document Recognition," Pattern Recognition, vol. 27, no. 6, pp 825-840, June 1994
  • G. S .Lehal and Chandan Singh, "Text segmentation of machine printed Gurmukhi script", Document Recognition and Retrieval VIII, Proceedings SPIE, USA, vol. 4307, pp. 223-231, 2001.
  • G.S.Lehal and Chandan Singh, "A technique for segmentation of Gurmukhi script", Computer Analysis of Images and Patterns, Proceedings CAIP 2001, Warsaw, Poland, Lecture Notes in Computer Science, vol. 2127 Springer-Verlag, pp. 191-200, 2001.
  • Veena Bansal and R.M.K. Sinha , "Segmentation of touching characters in Devanagari," in Indian Conference on Computer Vision, Graphics and Image Processing, New Delhi: pp 377-380(1998)
  • U. Garain, B.B. Chaudhuri, "Segmentation of touching characters in printed Devanagari and Bangla scripts using fuzzy multifactorial analysis", IEEE Trans. Systems Man Cybern. Part C-32 (2002) 449- 459.
  • U. Garain, B.B. Chaudhuri, "On recognition of touching characters in printed Bangla Documents", Proceedings of the Fourth International Conference on Document Analysis and Recognition, 1997, pp. 1011- 1016.
  • Tao Hong, "Degraded text recognition using visual and linguistic context", a dissertation submitted to the faculty of the graduate school of the State University of New York at Buffalo, 1995.