There is a newer version of this record available.

Dataset Open Access

Scans and transcriptions of the VOC and the Haarlem notarial deeds archives

Liesbeth Keijser

The National Archives of the Netherlands and the Noord-Hollands Archief started a colloboration with the Transkribus HTR (Handwritten Text Recognition) platform in order to semi automatically transcribe 2 million pages of old Dutch texts. The archives are 17th and 18th century material from the Dutch East-Asia Company (VOC) and 19th century notarial deeds from the city of Haarlem.
In order to train the HTR software, human made transciptions had to be made. 

These datasets contain the scans (.jpg images) with the transcriptions in ALTO xml format (word level) that have been made in order to train the HTR-model for text recognition.

The first set contains scans and transcriptions from the Verenigde Oost-Indische Compagnie (VOC) archive, it's inventory can be found here: http://www.gahetna.nl/archievenoverzicht/pdf/NL-HaNA_1.04.02.ead.pdf

Inventory numbers
The transcipts are samples of the following inventory numbers: 7528-9540

Country/place
Dutch Indies (modern day Indonesia) / Batavia (modern day Jakarta)

Language
Dutch

Number of transcriptions
4735 (mostly split)

-------------------------------------------------------------

The second set contains scans and transcriptions from the Notarial deeds of Haarlem, it's inventories can be found here:
https://noord-hollandsarchief.nl/bronnen/archieven?mivast=236&mizig=210&miadt=236&micode=1972&milang=nl&miview=inv2
https://noord-hollandsarchief.nl/bronnen/archieven?mivast=236&mizig=210&miadt=236&micode=1617&milang=nl&miview=inv2

This set also contains scans and transcriptions from other notarial archives, from Dutch provinces. They are however few in number.

Inventory numbers
The transcipts are samples of the following inventory numbers: 1617_1600 until 1617_1805 and 1972_5 until 1972_813

Country/place
The Netherlands / Haarlem

Language
Dutch and sometimes French

Number of transcriptions
1615 (mostly spread)

Files (25.8 GB)
Name Size
Notarial deeds.7z
md5:8f71da1b285b1415bbd182620d91bb32
7.3 GB Download
Verenigde Oost-Indische Compagnie (VOC).7z
md5:f70408f9772df48d45acd585ee09ad8e
18.5 GB Download
866
223
views
downloads
All versions This version
Views 866432
Downloads 223114
Data volume 2.3 TB1.5 TB
Unique views 728373
Unique downloads 12075

Share

Cite as