ScriptNet: ICDAR 2017 Competition on Baseline Detection in Archival Documents (cBAD)
- 1. TU Wien
- 2. CITlab
- 3. NCSR Demokritos
Description
This dataset contains the training and test set for the ICDAR 2017 Competition on Baseline Detection in Archival Documents (cBAD).
Two newly created, freely available, real world datasets are the basis for the competition. There will be two tracks of participation. The first track deals with the basic baseline detection of handwritten texts in paragraph form. In total 750 pages of handwritten archival documents (no tables or marginalia) with manually annotated baselines and text regions (paragraphs) are prepared. The second track consists of more challenging data including tables, marginalia, and noisy document images. Textlines can be skewed up to 180°. About 1200 pages of archival documents (handwritten and printed documents) have been manually annotated. For both tracks, the images are provided from 9 different archives and document collections.
The training set comprises images with additional PAGE XMLs while the test set consists of images only. The PAGE XML contains text regions, e.g. paragraphs.
Competition Website: https://scriptnet.iit.demokritos.gr/competitions/5/
Notes
Files
Files
(4.2 GB)
Name | Size | Download all |
---|---|---|
md5:658f7ca141acbcae72bc0751d1fd0006
|
2.3 GB | Download |
md5:a9df8de41af192fb18c930c25e48b18f
|
964.5 MB | Download |
md5:1269d8b4a162d124a298966f48c770af
|
572.3 MB | Download |
md5:1f92e998c74f8864d0e226e1a3c6078a
|
331.4 MB | Download |