ScriptNet: ICDAR 2017 Competition on Baseline Detection in Archival Documents (cBAD)

Diem, Markus; Kleber, Florian; Fiel, Stefan; Grüning, Tobias; Gatos, Basilis

doi:10.5281/zenodo.1491441

Published January 23, 2017 | Version v4

Dataset Open

ScriptNet: ICDAR 2017 Competition on Baseline Detection in Archival Documents (cBAD)

1. Computer Vision Lab, TU Wien
2. Computational Intelligence Technology Lab, University of Rostock
3. Computational Intelligence Laboratory, National Center of Scientific Research Demokritos

This dataset contains the training and test set for the ICDAR 2017 Competition on Baseline Detection in Archival Documents (cBAD).

A newly created freely available real world dataset consisting of 2035 annotated document page images that are collected from 9 different archives and form the basis of cBAD. Two competition tracks test different characteristics of the methods submitted. Track A [Simple Documents] is published with annotated text regions and tests therefore a method's quality of text line segmentation. The more challenging Track B [Complex Documents] provides only the page area. Hence, baseline detection algorithms need to correctly locate text lines in the presence of marginalia, tables, and noise.

The dataset comprises images with additional PAGE XMLs. The PAGE XMLs contain text regions and baseline annotations.

Competition Website: https://scriptnet.iit.demokritos.gr/competitions/5/

Version 3 is the version of the cBad competition

Version 4 contains also the page region and in case of a double-page the page split as separator.

Notes

This project has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 674943

Files

READ-ICDAR2017-cBAD-dataset-v4.zip

Files (4.2 GB)

Name	Size
READ-ICDAR2017-cBAD-dataset-v4.zip md5:5004e86616187cae3f2d177baaabbfe4	4.2 GB	Preview Download

Additional details

European Commission
READ - Recognition and Enrichment of Archival Documents 674943

M. Diem, F. Kleber, S. Fiel, T. Grüning, and B. Gatos, cBAD: ICDAR2017 Competition on Baseline Detection, In proceedings of the International Conference on Document Analysis and Recognition 2017, in press
T. Grüning, R. Labahn, M. Diem, F. Kleber, and S. Fiel, READ-BAD: A new dataset and evaluation scheme for baseline detection in archival documents," CoRR, vol. abs/1705.03311, 2017. [Online]. Available: http://arxiv.org/abs/1705.03311

	All versions	This version
Views	6,328	1,427
Downloads	2,471	703
Data volume	30.9 TB	12.3 TB

READ-ICDAR2017-cBAD-dataset-v4.zip

Files (4.2 GB)

Funding

References

ScriptNet: ICDAR 2017 Competition on Baseline Detection in Archival Documents (cBAD)

Authors/Creators

Description

Notes

Files

READ-ICDAR2017-cBAD-dataset-v4.zip

Files (4.2 GB)

Additional details

Funding

References