There is a newer version of the record available.

Published January 23, 2017 | Version v1
Dataset Open

ScriptNet: ICDAR 2017 Competition on Baseline Detection in Archival Documents (cBAD)

  • 1. TU Wien
  • 2. CITlab
  • 3. NCSR Demokritos

Description

This dataset contains the training and test set for the ICDAR 2017 Competition on Baseline Detection in Archival Documents (cBAD).

Two newly created, freely available, real world datasets are the basis for the competition. There will be two tracks of participation. The first track deals with the basic baseline detection of handwritten texts in paragraph form. In total 750 pages of handwritten archival documents (no tables or marginalia) with manually annotated baselines and text regions (paragraphs) are prepared. The second track consists of more challenging data including tables, marginalia, and noisy document images. Textlines can be skewed up to 180°. About 1200 pages of archival documents (handwritten and printed documents) have been manually annotated. For both tracks, the images are provided from 9 different archives and document collections.

The training set comprises images with additional PAGE XMLs while the test set consists of images only. The PAGE XML contains text regions, e.g. paragraphs.

Competition Website: https://scriptnet.iit.demokritos.gr/competitions/5/

Notes

This project has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 674943

Files

Files (4.2 GB)

Name Size Download all
md5:658f7ca141acbcae72bc0751d1fd0006
2.3 GB Download
md5:a9df8de41af192fb18c930c25e48b18f
964.5 MB Download
md5:1269d8b4a162d124a298966f48c770af
572.3 MB Download
md5:1f92e998c74f8864d0e226e1a3c6078a
331.4 MB Download

Additional details

Funding

READ – Recognition and Enrichment of Archival Documents 674943
European Commission