There is a newer version of this record available.

Dataset Open Access

Scans and transcriptions of the VOC and the Haarlem notarial deeds archives

Liesbeth Keijser


DataCite XML Export

<?xml version='1.0' encoding='utf-8'?>
<resource xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://datacite.org/schema/kernel-4" xsi:schemaLocation="http://datacite.org/schema/kernel-4 http://schema.datacite.org/meta/kernel-4.1/metadata.xsd">
  <identifier identifierType="DOI">10.5281/zenodo.3906480</identifier>
  <creators>
    <creator>
      <creatorName>Liesbeth Keijser</creatorName>
      <affiliation>National Archive Netherlands</affiliation>
    </creator>
  </creators>
  <titles>
    <title>Scans and transcriptions of the VOC and the Haarlem notarial deeds archives</title>
  </titles>
  <publisher>Zenodo</publisher>
  <publicationYear>2020</publicationYear>
  <subjects>
    <subject>Transciptions</subject>
    <subject>Verenigde Oost-Indische Compagnie</subject>
    <subject>Notarial deeds</subject>
    <subject>Nationaal Archief</subject>
    <subject>Noord-Hollands Archief</subject>
    <subject>Transkribus</subject>
  </subjects>
  <dates>
    <date dateType="Issued">2020-01-21</date>
  </dates>
  <resourceType resourceTypeGeneral="Dataset"/>
  <alternateIdentifiers>
    <alternateIdentifier alternateIdentifierType="url">https://zenodo.org/record/3906480</alternateIdentifier>
  </alternateIdentifiers>
  <relatedIdentifiers>
    <relatedIdentifier relatedIdentifierType="DOI" relationType="IsVersionOf">10.5281/zenodo.3517776</relatedIdentifier>
  </relatedIdentifiers>
  <version>4.0</version>
  <rightsList>
    <rights rightsURI="https://creativecommons.org/licenses/by/4.0/legalcode">Creative Commons Attribution 4.0 International</rights>
    <rights rightsURI="info:eu-repo/semantics/openAccess">Open Access</rights>
  </rightsList>
  <descriptions>
    <description descriptionType="Abstract">&lt;p&gt;The National Archives of the Netherlands and the Noord-Hollands Archief started a collaboration with the Transkribus HTR (Handwritten Text Recognition) platform in order to semi automatically transcribe 2 million pages of old Dutch texts. The archives are 17th and 18th century material from the Dutch East-Asia Company (VOC) and 19th century notarial deeds from the city of Haarlem.&lt;br&gt;
In order to train the HTR software, human made transciptions had to be made.&amp;nbsp;&lt;/p&gt;

&lt;p&gt;These datasets contain the scans (.jpg images) with the transcriptions in ALTO xml format (word level) that have been made in order to train the HTR-model for text recognition.&lt;br&gt;
&lt;br&gt;
The first set contains scans and transcriptions from the Verenigde Oost-Indische Compagnie (VOC) archive, it&amp;#39;s inventory can be found here:&amp;nbsp;&lt;a href="http://www.gahetna.nl/archievenoverzicht/pdf/NL-HaNA_1.04.02.ead.pdf"&gt;http://www.gahetna.nl/archievenoverzicht/pdf/NL-HaNA_1.04.02.ead.pdf&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Inventory numbers&lt;/strong&gt;&lt;br&gt;
The transcipts are samples of the following inventory numbers: 7527-9540&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Country/place&lt;/strong&gt;&lt;br&gt;
Dutch Indies (modern day Indonesia) / Batavia&amp;nbsp;(modern day Jakarta)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Language&lt;/strong&gt;&lt;br&gt;
Dutch&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Number of transcriptions&lt;/strong&gt;&lt;br&gt;
4735&amp;nbsp;(mostly split)&lt;/p&gt;

&lt;p&gt;-------------------------------------------------------------&lt;/p&gt;

&lt;p&gt;The second set contains scans and transcriptions from the Notarial deeds of Haarlem, it&amp;#39;s inventories can be found here:&lt;br&gt;
&lt;a href="https://noord-hollandsarchief.nl/bronnen/archieven?mivast=236&amp;amp;mizig=210&amp;amp;miadt=236&amp;amp;micode=1972&amp;amp;milang=nl&amp;amp;miview=inv2"&gt;https://noord-hollandsarchief.nl/bronnen/archieven?mivast=236&amp;amp;mizig=210&amp;amp;miadt=236&amp;amp;micode=1972&amp;amp;milang=nl&amp;amp;miview=inv2&lt;/a&gt;&lt;br&gt;
&lt;a href="https://noord-hollandsarchief.nl/bronnen/archieven?mivast=236&amp;amp;mizig=210&amp;amp;miadt=236&amp;amp;micode=1617&amp;amp;milang=nl&amp;amp;miview=inv2"&gt;https://noord-hollandsarchief.nl/bronnen/archieven?mivast=236&amp;amp;mizig=210&amp;amp;miadt=236&amp;amp;micode=1617&amp;amp;milang=nl&amp;amp;miview=inv2&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This set also contains scans and transcriptions from other notarial archives, from&amp;nbsp;Dutch provinces. They are however few in number.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Inventory numbers&lt;/strong&gt;&lt;br&gt;
The transcipts are samples of the following inventory numbers: 1617_1593&amp;nbsp;until 1617_1805 and 1972_5&amp;nbsp;until 1972_813&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Country/place&lt;/strong&gt;&lt;br&gt;
The Netherlands&amp;nbsp;/ Haarlem&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Language&lt;/strong&gt;&lt;br&gt;
Dutch and sometimes French&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Number of transcriptions&lt;/strong&gt;&lt;br&gt;
1615&amp;nbsp;(mostly spread)&lt;/p&gt;

&lt;p&gt;-------------------------------------------------------------&lt;/p&gt;

&lt;p&gt;The following HTR model was used for recognition: &amp;quot;IJsberg&amp;quot;. More information about the model van be found here:&amp;nbsp;&lt;a href="https://transkribus.eu/wiki/images/d/d6/Public_Models_in_Transkribus.pdf"&gt;https://transkribus.eu/wiki/images/d/d6/Public_Models_in_Transkribus.pdf&lt;/a&gt;. See the chapter &amp;quot;Dutch Handwriting&amp;quot;.&lt;/p&gt;

&lt;p&gt;-------------------------------------------------------------&lt;/p&gt;

&lt;p&gt;Update: upon request, PageXML files of the transcriptions have been added and are seperately downloadable.&lt;/p&gt;

&lt;p&gt;Version 3.0: The first HTR results from the VOC-collection are available in .txt format,&amp;nbsp;Inventory numbers 7527-9540.&lt;/p&gt;

&lt;p&gt;Version 3.1: The HTR results from the VOC-collection are also available in PAGE xml format.&amp;nbsp;&lt;/p&gt;

&lt;p&gt;Version 4.0: About 30 missing inventory numbers have been added to the VOC transcriptions. The HTR results of the&amp;nbsp;Notarial Deeds from the NHA archives have been added. An example on full text searchable research can be found here (Dutch):&amp;nbsp;&lt;a href="https://kia.pleio.nl/groups/view/55812425/htr-en-ocr/blog/view/55814752/reconstructie-van-een-verijdelde-slavenopstand-met-behulp-van-automatische-handschriftherkenning-en-text-mining"&gt;https://kia.pleio.nl/groups/view/55812425/htr-en-ocr/blog/view/55814752/reconstructie-van-een-verijdelde-slavenopstand-met-behulp-van-automatische-handschriftherkenning-en-text-mining&lt;/a&gt;&lt;/p&gt;</description>
  </descriptions>
</resource>
12,803
2,824
views
downloads
All versions This version
Views 12,803972
Downloads 2,8241,001
Data volume 12.4 TB5.6 TB
Unique views 10,795860
Unique downloads 1,288258

Share

Cite as