Published May 31, 2021 | Version 1.1
Dataset Open

Gado2: multilingual newspapers from the Netherlands Indies

  • 1. Koninklijke Bibliotheek

Contributors

Data collector:

  • 1. National Library of the Netherlands
  • 2. Satya Wacana Christian University

Description

This Handwritten Text Recognition (HTR) xml-page file dataset contains the ground truths of the Gado2 named entity processing application for newspapers from the Netherlands Indies and Indonesia, see: https://github.com/KBNLresearch/gado2. Optical Character Recognition (OCR) resulted in high Character Error Rates (CER) due to the inferior quality of many scans. In contrast, HTR led to CERs below 0.5 percent thus increasing the efficiency of the NER engine. All uploaded files are free of errors and fully tagged. A relevant knowledge base of Indonesian persons, places and organisations is attached in json format for entity linking.

Files

Douwes_dekker_0020.xml

Files (1.0 GB)

Name Size Download all
md5:ae6b48512f88fb481f599f448e4a94b4
4.8 kB Preview Download
md5:7a15137a11f39d462cf5ae70ce65ea1a
23.9 kB Preview Download
md5:a27b6fcb75ee9e824a86115d83c8fba9
6.0 kB Preview Download
md5:f1ab557f857374d3b622a44220bb7210
24.3 kB Preview Download
md5:dc5fd4b1c5e5d5f0163035b0a4a83972
4.9 kB Preview Download
md5:d59c37449ebec2e7bafa2d6475daad41
26.2 kB Preview Download
md5:9da786b3eb395485e332fa3919f4cfb8
4.2 kB Preview Download
md5:dcc716053a75526a4f930dafbed5fb8c
24.5 kB Preview Download
md5:8258775047caeec347a167251937c085
24.2 kB Preview Download
md5:0345997f1c686ab4c9dd4bf37d1d7cf6
26.1 kB Preview Download
md5:86945a138679d8a03afc3839cb77bcce
23.1 kB Preview Download
md5:33339828138e1d8894d97919f0072595
23.8 kB Preview Download
md5:b356b6878d9edb762c203ff9f91aafdc
25.7 kB Preview Download
md5:63f72a536f714fe5bb893b45fab2c8c4
24.5 kB Preview Download
md5:1f9cb54bc4e5868f05677c9e61726970
25.0 kB Preview Download
md5:33fba207dcfaac034c88bf087da8fa3c
24.5 kB Preview Download
md5:91368e1dc7c0fbd85066119ecc3f46f2
4.1 kB Preview Download
md5:d7daa8e6ec6f2e8667b16f3bb95957fc
24.9 kB Preview Download
md5:7e6cb6c59fd4e94259399125896aab2c
24.9 kB Preview Download
md5:6c64facfaba59115985517c7c412a848
25.9 kB Preview Download
md5:d00694b2ff7fb6ceb2803b00bd8d0f16
24.2 kB Preview Download
md5:23aff5e9b03ed9b6d41fd18cdf9ba992
24.3 kB Preview Download
md5:eb286aa05395347468f211c205c61dee
25.2 kB Preview Download
md5:f7613b18079dd76dd4b0dd670b183769
24.2 kB Preview Download
md5:3770079731eecc794464e4eefebe5506
4.9 kB Preview Download
md5:b517d2d5e002c23a7fd4677e504d3c6f
25.2 kB Preview Download
md5:46df2a8abda23c69ff80861c388c85b7
23.8 kB Preview Download
md5:eeeedecd85a21022e3fb2b825909c1c3
23.0 kB Preview Download
md5:8dbcd473756c7301c3fbf698d985785f
12.6 kB Preview Download
md5:0ad7630607e8d11eab2993674871093c
5.1 kB Preview Download
md5:e9455c0a5328640f947421de741ff6bb
6.1 kB Preview Download
md5:61de1df74a338e93209e8dff369cde43
23.7 kB Preview Download
md5:731414f1fbb91129f29b5d636404a685
72.3 kB Preview Download
md5:ecfdabfbaef804c5262dc504ed848271
91.7 kB Preview Download
md5:9f328127049d2f4f88901d8ff6dfc588
82.9 kB Preview Download
md5:fdaba8b0420f85a31d792730772a5be4
78.6 kB Preview Download
md5:432336ea898eb1e05346f2d605f28643
1.0 GB Preview Download
md5:e857aa1ab9c9733a7ce7fb582cab3e34
1.0 MB Preview Download
md5:2b04f2629be0da4e3c0ef256c4131967
13.7 MB Preview Download
md5:e857aa1ab9c9733a7ce7fb582cab3e34
1.0 MB Preview Download
md5:b0d663e2b25f91921c9c0a0c9ee77a51
13.5 kB Preview Download