Published December 6, 2023 | Version Camready; rc0-0, with pagexml, code, spreadsheets
Dataset Open

The Middle Dutch Manuscripts Surviving from the Carthusian Monastery of Herne (14th century)

Description

This repository contains the dataset described in the following conference paper:

Wouter Haverals & Mike Kestemont, "The Middle Dutch Manuscripts Surviving from the Carthusian Monastery of Herne (14th century): Constructing an Open Dataset of Digital Transcriptions". CHR 2023: Computational Humanities Research Conference. December 6-8, 2023, Paris, France.

The dataset consists of (automatically created) hyper-diplomatic, digital transcriptions of 18 Middle Dutch manuscripts that survive from the carthusian monastery in Herne in nowadays Belgium (or manuscripts which have meaningful ties with the charterhouse). These manuscripts primarily date to the second half of the fourteenth century and offer exciting possibilities for the analysis of authorship, translatorship and scribal practices in the history of the Low Countries. The transcriptions have been (partly) automated through the use of handwritten text recognition (on the Transkribus platform). This dataset is licensed under a CC-BY 4.0 licence, encouraging the further re-use of this data for all purposes, provided an unambiguous scholarly reference to the paper above is given.

Content

Transcriptions for the following 18 manuscripts are included in various formats:

  • Brussels, RL, 1805-1808
  • Brussels, RL, 2485
  • Brussels, RL, 2849-51
  • Brussels, RL, 2877-78
  • Brussels, RL, 2879-80
  • Brussels, RL, 2905-09
  • Brussels, RL, 2979
  • Brussels, RL, 3091
  • Brussels, RL, 3093-95
  • Ghent, UL, 1374
  • Ghent, UL, 941
  • Paris, Bibl. Mazarine, 920
  • Paris, Bibl. de l'Arsenal, 8224
  • Saint Petersburg, BAN, O 256
  • Vienna, ÖNB, SN 12.857
  • Vienna, ÖNB, SN 12.905
  • Vienna, ÖNB, Cod. 13.708
  • Vienna, ÖNB, SN 65

The contents of the repository have been structured as follows:

  • transcriptions: transcriptions of the 18 manuscripts in various formats (hyper-diplomatic; i.e. without brevigraph expansion):
    • pagexmls: One file per folium, encoded in the PAGEXML format as outputted by Transkribus. One zip-file per manuscript folder.
  • spreadsheets.zip: detailed metadata on various aspects of the data in spreadsheat format.
    • silent_voices_summary.xlsx: summary statistics at the codex-level (cf. Table 2 in the paper)
    • codex_info.xlsx: folium-level metadata
    • manuscript_data_metadata.xlsx: text region-level metadata
    • manuscript_data_metadata_rich.xlsx: contains the most convenient and complete version of the dataset, including the texts with automatically expanded abbreviations and the linguistic enrichment (lemma's and part-of-speech tags).
  • code: Python notebooks (requiring Python >= 3.8).
    • transduction.ipynb: the notebook for the replication of the abbreviation expansion experiments described in the paper. (See also the configuration file for there tagger norm.json.
    • enrich.ipynb: the notebook used for the linguistic enrichment of the expanded texts, on the basis of the PIE(-NLP) lemmatizer. (See also the PIE model file herne-norm.tar, which is used in the enrichment.)
    • requirements.txt: third-party dependencies for running the code in these notebooks. Note: enrich.ipynb will require you the Middle Dutch (DUM) model for nlp-pie.

Related data

  • The final Transkribus model used to generate the transcriptions will be make publicly available on the platform.
  • The accompanying images are released in a separate, restricted access repository on Zenodo, because we were unable to clear the copyright on some of the facsimiles. We will only be able to share these images under very strict conditions.

Acknowledgments

Thanks to Anouck Kuypers, Sam Verellen and Frans de Jonge for their work on the transcriptions. The transcription of Brussels, RL, 3093-95 was contributed by Dr. Ine Kiekens. We acknowledge the help of Renée Gabriël and Peter Boot in previous collaborations that relate to the present paper. Finally, we would like to thank Caroline Vandyck who has helped with the finalization of the dataset.

Funding statement

This work has been funded by the Flemish Research Agency (FWO) in the context of the project "Silent voices: A Digital Study of the Herne Charterhouse as a Textual Community (ca. 1350-1400)".

Files

Brussel, KBR, 1805-1808.zip

Files (166.3 MB)

Name Size Download all
md5:110fbf52e79f6325285c8782562dc8e1
5.9 MB Preview Download
md5:30e47db7e93f867d0fc9b1c3292ada95
7.3 MB Preview Download
md5:6acfb6a4179777bd09ae52929ee1d565
8.2 MB Preview Download
md5:2ce0a784fd7cd22ec4719c16fec0f8fd
32.7 MB Preview Download
md5:64b50ea3cd1e61e54c0cb651b9f1a4d9
24.7 MB Preview Download
md5:fdc6d55e845ea5490e9c32702de041a1
4.6 MB Preview Download
md5:809c2f1a590833f785affaee5d6bf018
583.7 kB Preview Download
md5:1847c475e064eb5ea6821438b1529e0c
4.4 MB Preview Download
md5:c845d2c7c067e92e110d3aa3b2817803
2.0 MB Preview Download
md5:f9f3e23bf256e21b9dd002ff870ef0e7
3.3 MB Preview Download
md5:c79d72729e050669f5384f5e3b4e221c
13.6 MB Preview Download
md5:a61151e04bf5fdb56450b7f0bc2f7af0
3.1 MB Preview Download
md5:a3a00ec9ffa5a21d77882435e5e3663e
486.7 kB Preview Download
md5:b0577e22222b69da83348dd2430158cf
6.8 MB Preview Download
md5:845818df33e16687e6e005d6ef8fbc8e
3.2 MB Preview Download
md5:12dda273f9f354e6d90f810233aa9a80
16.3 MB Preview Download
md5:66fc0a1669f93792278ab2179f7cb133
4.9 MB Preview Download
md5:6282a26f717b7b82a4f242d4b36ffc05
4.1 MB Preview Download
md5:4bb1084ea438ffb5b0446dffa8d2d5c5
16.2 MB Preview Download
md5:412fd53719b0300c0b492bf7146db6bc
3.8 MB Preview Download

Additional details

Related works

Is derived from
Dataset: 10.5281/zenodo.10005275 (DOI)