Dataset Open Access
The CVL ruling dataset was synthetically generated to allow for comparing different ruling removal methods. It is based on the ICDAR 2013 Handwriting Segmentation database . It was generated by synthetically adding four different ruling images resulting in a total of 600 test images. The pixel values are:
For processing, a binary image must be generated which sets all pixels to 0 that are not 255. When evaluating, the line GT image can be found by setting all pixel having value 155 to one (e.g. linImg = img == 155). The text GT image can be extracted by setting all values below 155 to zero (e.g. txtImg = img < 155). Then, true positives (tp), false positives (fp) and false negatives (fn) are defined as:
The database ships with a Matlab that gives evaluation results if all images are already processed.
Markus Diem, Florian Kleber and Robert Sablatnig. Ruling Analysis and Classification of Torn Documents. In ACM Symposium on Document Engineering. Colorado, USA, pages 63 – 72 2014.
N. Stamatopoulos, B. Gatos, G. Louloudis, U. Pal and A. Alaei. ICDAR 2013 Handwriting Segmentation Contest. In Proceedings of the 12th International Conference on Document Analysis and Recognition, 2013, 1402-1406