Liesbeth Keijser
2020-01-21
<p>The National Archives of the Netherlands and the Noord-Hollands Archief started a colloboration with the Transkribus HTR (Handwritten Text Recognition) platform in order to semi automatically transcribe 2 million pages of old Dutch texts. The archives are 17th and 18th century material from the Dutch East-Asia Company (VOC) and 19th century notarial deeds from the city of Haarlem.<br>
In order to train the HTR software, human made transciptions had to be made. </p>
<p>These datasets contain the scans (.jpg images) with the transcriptions in ALTO xml format (word level) that have been made in order to train the HTR-model for text recognition.<br>
<br>
The first set contains scans and transcriptions from the Verenigde Oost-Indische Compagnie (VOC) archive, it's inventory can be found here: <a href="http://www.gahetna.nl/archievenoverzicht/pdf/NL-HaNA_1.04.02.ead.pdf">http://www.gahetna.nl/archievenoverzicht/pdf/NL-HaNA_1.04.02.ead.pdf</a></p>
<p><strong>Inventory numbers</strong><br>
The transcipts are samples of the following inventory numbers: 7528-9540</p>
<p><strong>Country/place</strong><br>
Dutch Indies (modern day Indonesia) / Batavia (modern day Jakarta)</p>
<p><strong>Language</strong><br>
Dutch</p>
<p><strong>Number of transcriptions</strong><br>
4735 (mostly split)</p>
<p>-------------------------------------------------------------</p>
<p>The second set contains scans and transcriptions from the Notarial deeds of Haarlem, it's inventories can be found here:<br>
<a href="https://noord-hollandsarchief.nl/bronnen/archieven?mivast=236&mizig=210&miadt=236&micode=1972&milang=nl&miview=inv2">https://noord-hollandsarchief.nl/bronnen/archieven?mivast=236&mizig=210&miadt=236&micode=1972&milang=nl&miview=inv2</a><br>
<a href="https://noord-hollandsarchief.nl/bronnen/archieven?mivast=236&mizig=210&miadt=236&micode=1617&milang=nl&miview=inv2">https://noord-hollandsarchief.nl/bronnen/archieven?mivast=236&mizig=210&miadt=236&micode=1617&milang=nl&miview=inv2</a></p>
<p>This set also contains scans and transcriptions from other notarial archives, from Dutch provinces. They are however few in number.</p>
<p><strong>Inventory numbers</strong><br>
The transcipts are samples of the following inventory numbers: 1617_1600 until 1617_1805 and 1972_5 until 1972_813</p>
<p><strong>Country/place</strong><br>
The Netherlands / Haarlem</p>
<p><strong>Language</strong><br>
Dutch and sometimes French</p>
<p><strong>Number of transcriptions</strong><br>
1615 (mostly spread)</p>
https://doi.org/10.5281/zenodo.3613666
oai:zenodo.org:3613666
odt
Zenodo
https://doi.org/10.5281/zenodo.3517776
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
Transciptions
Verenigde Oost-Indische Compagnie
Notarial deeds
Nationaal Archief
Noord-Hollands Archief
Transkribus
Scans and transcriptions of the VOC and the Haarlem notarial deeds archives
info:eu-repo/semantics/other