Dataset for ICDAR2017 Competition on Handwritten Text Recognition on the READ Dataset (ICDAR2017 HTR)
Creators
- 1. PRHLT, Universitat Politècnica de València, Spain
Description
Train-A: Dataset of pages with manually revised baselines and the corresponding transcripts associated to them. This batch is small, 50 pages. Please, keep in mind that only the baselines have been manually corrected, The polygons associated to each line have not been manually reviewed.
Train-B: Dataset of pages without any layout or text line information. The corresponding transcripts are provided at page level with line breaks. It has 10k pages, though for convenience it is divided into two 5k page batches. This information is provided in PAGE format.
Test A: Dataset of pages with manually revised baselines. This batch has 65 pages. The polygons associated to each line have not been manually reviewed.
Test-B1: The same dataset of pages of the Test A, but annotated only with the geometry of regions. Text line information is not provided.
Test-B2: Dataset of page images annotated with the geometry of regions where to detect text line and recognize. It has 57 pages.
Baseline.tgz: Baseline system trained using the first 40 pages of Train-A. The system is based on the deep learning toolkit to transcribe handwritten text images called Laia.
More information at:
https://scriptnet.iit.demokritos.gr/competitions/~icdar2017htr/
Files
Files
(4.0 GB)
Name | Size | Download all |
---|---|---|
md5:5ef6d6d9a1be6785686559d6f8c9b67a
|
22.1 MB | Download |
md5:f989a3f056d1b830564594a576b4dc75
|
70.9 MB | Download |
md5:6bea580c2fdcae850041738bc03d8c1c
|
70.8 MB | Download |
md5:0bea41d3beab30431fdb3ad01f5929ab
|
48.0 MB | Download |
md5:e46c7019f8ac639b796ecb8d872fd481
|
21.4 MB | Download |
md5:e11b9d0cb97169d64069268a23e90ef2
|
1.9 GB | Download |
md5:93ea0b7285f65c8438155e9490c691ed
|
1.9 GB | Download |