Published November 20, 2018 | Version v1
Dataset Open

CVL Ruling Database

  • 1. TU Wien

Description

The CVL ruling dataset was synthetically generated to allow for comparing different ruling removal methods. It is based on the ICDAR 2013 Handwriting Segmentation database [1]. It was generated by synthetically adding four different ruling images resulting in a total of 600 test images. The pixel values are:

  • 255 background
  • 155 ruling
  • 100 text
  • 0 ruling and text (overlaping)

For processing, a binary image must be generated which sets all pixels to 0 that are not 255. When evaluating, the line GT image can be found by setting all pixel having value 155 to one (e.g. linImg = img == 155). The text GT image can be extracted by setting all values below 155 to zero (e.g. txtImg = img < 155). Then, true positives (tp), false positives (fp) and false negatives (fn) are defined as:

  • tp = result & linImg & !txtImg
  • fp = result & !txtImg
  • fn = !result & linImg & !txtImg

The database ships with a Matlab that gives evaluation results if all images are already processed.

Files

cvl-ruling-database.zip

Files (83.9 MB)

Name Size Download all
md5:8a1599d56f59d188f9a9f73525df805c
83.9 MB Preview Download

Additional details

References

  • Markus Diem, Florian Kleber and Robert Sablatnig. Ruling Analysis and Classification of Torn Documents. In ACM Symposium on Document Engineering. Colorado, USA, pages 63 – 72 2014.
  • N. Stamatopoulos, B. Gatos, G. Louloudis, U. Pal and A. Alaei. ICDAR 2013 Handwriting Segmentation Contest. In Proceedings of the 12th International Conference on Document Analysis and Recognition, 2013, 1402-1406