Published May 31, 2021
| Version 1.1
Dataset
Open
Gado2: multilingual newspapers from the Netherlands Indies
Contributors
Data collector:
Project leaders:
- 1. National Library of the Netherlands
- 2. Satya Wacana Christian University
Description
This Handwritten Text Recognition (HTR) xml-page file dataset contains the ground truths of the Gado2 named entity processing application for newspapers from the Netherlands Indies and Indonesia, see: https://github.com/KBNLresearch/gado2. Optical Character Recognition (OCR) resulted in high Character Error Rates (CER) due to the inferior quality of many scans. In contrast, HTR led to CERs below 0.5 percent thus increasing the efficiency of the NER engine. All uploaded files are free of errors and fully tagged. A relevant knowledge base of Indonesian persons, places and organisations is attached in json format for entity linking.
Files
Douwes_dekker_0020.xml
Files
(1.0 GB)
Name | Size | Download all |
---|---|---|
md5:ae6b48512f88fb481f599f448e4a94b4
|
4.8 kB | Preview Download |
md5:7a15137a11f39d462cf5ae70ce65ea1a
|
23.9 kB | Preview Download |
md5:a27b6fcb75ee9e824a86115d83c8fba9
|
6.0 kB | Preview Download |
md5:f1ab557f857374d3b622a44220bb7210
|
24.3 kB | Preview Download |
md5:dc5fd4b1c5e5d5f0163035b0a4a83972
|
4.9 kB | Preview Download |
md5:d59c37449ebec2e7bafa2d6475daad41
|
26.2 kB | Preview Download |
md5:9da786b3eb395485e332fa3919f4cfb8
|
4.2 kB | Preview Download |
md5:dcc716053a75526a4f930dafbed5fb8c
|
24.5 kB | Preview Download |
md5:8258775047caeec347a167251937c085
|
24.2 kB | Preview Download |
md5:0345997f1c686ab4c9dd4bf37d1d7cf6
|
26.1 kB | Preview Download |
md5:86945a138679d8a03afc3839cb77bcce
|
23.1 kB | Preview Download |
md5:33339828138e1d8894d97919f0072595
|
23.8 kB | Preview Download |
md5:b356b6878d9edb762c203ff9f91aafdc
|
25.7 kB | Preview Download |
md5:63f72a536f714fe5bb893b45fab2c8c4
|
24.5 kB | Preview Download |
md5:1f9cb54bc4e5868f05677c9e61726970
|
25.0 kB | Preview Download |
md5:33fba207dcfaac034c88bf087da8fa3c
|
24.5 kB | Preview Download |
md5:91368e1dc7c0fbd85066119ecc3f46f2
|
4.1 kB | Preview Download |
md5:d7daa8e6ec6f2e8667b16f3bb95957fc
|
24.9 kB | Preview Download |
md5:7e6cb6c59fd4e94259399125896aab2c
|
24.9 kB | Preview Download |
md5:6c64facfaba59115985517c7c412a848
|
25.9 kB | Preview Download |
md5:d00694b2ff7fb6ceb2803b00bd8d0f16
|
24.2 kB | Preview Download |
md5:23aff5e9b03ed9b6d41fd18cdf9ba992
|
24.3 kB | Preview Download |
md5:eb286aa05395347468f211c205c61dee
|
25.2 kB | Preview Download |
md5:f7613b18079dd76dd4b0dd670b183769
|
24.2 kB | Preview Download |
md5:3770079731eecc794464e4eefebe5506
|
4.9 kB | Preview Download |
md5:b517d2d5e002c23a7fd4677e504d3c6f
|
25.2 kB | Preview Download |
md5:46df2a8abda23c69ff80861c388c85b7
|
23.8 kB | Preview Download |
md5:eeeedecd85a21022e3fb2b825909c1c3
|
23.0 kB | Preview Download |
md5:8dbcd473756c7301c3fbf698d985785f
|
12.6 kB | Preview Download |
md5:0ad7630607e8d11eab2993674871093c
|
5.1 kB | Preview Download |
md5:e9455c0a5328640f947421de741ff6bb
|
6.1 kB | Preview Download |
md5:61de1df74a338e93209e8dff369cde43
|
23.7 kB | Preview Download |
md5:731414f1fbb91129f29b5d636404a685
|
72.3 kB | Preview Download |
md5:ecfdabfbaef804c5262dc504ed848271
|
91.7 kB | Preview Download |
md5:9f328127049d2f4f88901d8ff6dfc588
|
82.9 kB | Preview Download |
md5:fdaba8b0420f85a31d792730772a5be4
|
78.6 kB | Preview Download |
md5:432336ea898eb1e05346f2d605f28643
|
1.0 GB | Preview Download |
md5:e857aa1ab9c9733a7ce7fb582cab3e34
|
1.0 MB | Preview Download |
md5:2b04f2629be0da4e3c0ef256c4131967
|
13.7 MB | Preview Download |
md5:e857aa1ab9c9733a7ce7fb582cab3e34
|
1.0 MB | Preview Download |
md5:b0d663e2b25f91921c9c0a0c9ee77a51
|
13.5 kB | Preview Download |