ScriptNet: ICDAR 2017 Competition on Baseline Detection in Archival Documents (cBAD)

Diem, Markus; Kleber, Florian; Fiel, Stefan; Grüning, Tobias; Gatos, Basilis

doi:10.5281/zenodo.257972

Published January 23, 2017 | Version v1

Dataset Open

ScriptNet: ICDAR 2017 Competition on Baseline Detection in Archival Documents (cBAD)

1. TU Wien
2. CITlab
3. NCSR Demokritos

This dataset contains the training and test set for the ICDAR 2017 Competition on Baseline Detection in Archival Documents (cBAD).

Two newly created, freely available, real world datasets are the basis for the competition. There will be two tracks of participation. The first track deals with the basic baseline detection of handwritten texts in paragraph form. In total 750 pages of handwritten archival documents (no tables or marginalia) with manually annotated baselines and text regions (paragraphs) are prepared. The second track consists of more challenging data including tables, marginalia, and noisy document images. Textlines can be skewed up to 180°. About 1200 pages of archival documents (handwritten and printed documents) have been manually annotated. For both tracks, the images are provided from 9 different archives and document collections.

The training set comprises images with additional PAGE XMLs while the test set consists of images only. The PAGE XML contains text regions, e.g. paragraphs.

Competition Website: https://scriptnet.iit.demokritos.gr/competitions/5/

Notes

This project has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 674943

Files

Files (4.2 GB)

Name	Size	Download all
Test - Baseline Competition - Complex Documents - Clean.tar.gz md5:658f7ca141acbcae72bc0751d1fd0006	2.3 GB	Download
Test - Baseline Competition - Simple Documents - Clean.tar.gz md5:a9df8de41af192fb18c930c25e48b18f	964.5 MB	Download
Train - Baseline Competition - Complex Documents.tar.gz md5:1269d8b4a162d124a298966f48c770af	572.3 MB	Download
Train - Baseline Competition - Simple Documents.tar.gz md5:1f92e998c74f8864d0e226e1a3c6078a	331.4 MB	Download

Additional details

READ – Recognition and Enrichment of Archival Documents 674943: European Commission

	All versions	This version
Views	5,221	2,288
Downloads	1,837	610
Data volume	29.2 TB	1.1 TB

ScriptNet: ICDAR 2017 Competition on Baseline Detection in Archival Documents (cBAD)

Creators

Description

Notes

Files

Files (4.2 GB)

Additional details

Funding