The Middle Dutch Manuscripts Surviving from the Carthusian Monastery of Herne (14th century)
Creators
Description
This repository contains the dataset described in the following conference paper:
Wouter Haverals & Mike Kestemont, "The Middle Dutch Manuscripts Surviving from the Carthusian Monastery of Herne (14th century): Constructing an Open Dataset of Digital Transcriptions". CHR 2023: Computational Humanities Research Conference. December 6-8, 2023, Paris, France.
The dataset consists of (automatically created) hyper-diplomatic, digital transcriptions of 18 Middle Dutch manuscripts that survive from the carthusian monastery in Herne in nowadays Belgium (or manuscripts which have meaningful ties with the charterhouse). These manuscripts primarily date to the second half of the fourteenth century and offer exciting possibilities for the analysis of authorship, translatorship and scribal practices in the history of the Low Countries. The transcriptions have been (partly) automated through the use of handwritten text recognition (on the Transkribus platform). This dataset is licensed under a CC-BY 4.0 licence, encouraging the further re-use of this data for all purposes, provided an unambiguous scholarly reference to the paper above is given.
Content
Transcriptions for the following 18 manuscripts are included in various formats:
- Brussels, RL, 1805-1808
- Brussels, RL, 2485
- Brussels, RL, 2849-51
- Brussels, RL, 2877-78
- Brussels, RL, 2879-80
- Brussels, RL, 2905-09
- Brussels, RL, 2979
- Brussels, RL, 3091
- Brussels, RL, 3093-95
- Ghent, UL, 1374
- Ghent, UL, 941
- Paris, Bibl. Mazarine, 920
- Paris, Bibl. de l'Arsenal, 8224
- Saint Petersburg, BAN, O 256
- Vienna, ÖNB, SN 12.857
- Vienna, ÖNB, SN 12.905
- Vienna, ÖNB, Cod. 13.708
- Vienna, ÖNB, SN 65
The contents of the repository have been structured as follows:
- transcriptions: transcriptions of the 18 manuscripts in various formats (hyper-diplomatic; i.e. without brevigraph expansion):
- pagexmls: One file per folium, encoded in the PAGEXML format as outputted by Transkribus. One zip-file per manuscript folder.
- spreadsheets.zip: detailed metadata on various aspects of the data in spreadsheat format.
- silent_voices_summary.xlsx: summary statistics at the codex-level (cf. Table 2 in the paper)
- codex_info.xlsx: folium-level metadata
- manuscript_data_metadata.xlsx: text region-level metadata
- manuscript_data_metadata_rich.xlsx: contains the most convenient and complete version of the dataset, including the texts with automatically expanded abbreviations and the linguistic enrichment (lemma's and part-of-speech tags).
- code: Python notebooks (requiring Python >= 3.8).
- transduction.ipynb: the notebook for the replication of the abbreviation expansion experiments described in the paper. (See also the configuration file for there tagger norm.json.
- enrich.ipynb: the notebook used for the linguistic enrichment of the expanded texts, on the basis of the PIE(-NLP) lemmatizer. (See also the PIE model file herne-norm.tar, which is used in the enrichment.)
- requirements.txt: third-party dependencies for running the code in these notebooks. Note: enrich.ipynb will require you the Middle Dutch (DUM) model for nlp-pie.
Related data
- The final Transkribus model used to generate the transcriptions will be make publicly available on the platform.
- The accompanying images are released in a separate, restricted access repository on Zenodo, because we were unable to clear the copyright on some of the facsimiles. We will only be able to share these images under very strict conditions.
Acknowledgments
Thanks to Anouck Kuypers, Sam Verellen and Frans de Jonge for their work on the transcriptions. The transcription of Brussels, RL, 3093-95 was contributed by Dr. Ine Kiekens. We acknowledge the help of Renée Gabriël and Peter Boot in previous collaborations that relate to the present paper. Finally, we would like to thank Caroline Vandyck who has helped with the finalization of the dataset.
Funding statement
This work has been funded by the Flemish Research Agency (FWO) in the context of the project "Silent voices: A Digital Study of the Herne Charterhouse as a Textual Community (ca. 1350-1400)".
Files
Brussel, KBR, 1805-1808.zip
Files
(166.3 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:110fbf52e79f6325285c8782562dc8e1
|
5.9 MB | Preview Download |
|
md5:30e47db7e93f867d0fc9b1c3292ada95
|
7.3 MB | Preview Download |
|
md5:6acfb6a4179777bd09ae52929ee1d565
|
8.2 MB | Preview Download |
|
md5:2ce0a784fd7cd22ec4719c16fec0f8fd
|
32.7 MB | Preview Download |
|
md5:64b50ea3cd1e61e54c0cb651b9f1a4d9
|
24.7 MB | Preview Download |
|
md5:fdc6d55e845ea5490e9c32702de041a1
|
4.6 MB | Preview Download |
|
md5:809c2f1a590833f785affaee5d6bf018
|
583.7 kB | Preview Download |
|
md5:1847c475e064eb5ea6821438b1529e0c
|
4.4 MB | Preview Download |
|
md5:c845d2c7c067e92e110d3aa3b2817803
|
2.0 MB | Preview Download |
|
md5:f9f3e23bf256e21b9dd002ff870ef0e7
|
3.3 MB | Preview Download |
|
md5:c79d72729e050669f5384f5e3b4e221c
|
13.6 MB | Preview Download |
|
md5:a61151e04bf5fdb56450b7f0bc2f7af0
|
3.1 MB | Preview Download |
|
md5:a3a00ec9ffa5a21d77882435e5e3663e
|
486.7 kB | Preview Download |
|
md5:b0577e22222b69da83348dd2430158cf
|
6.8 MB | Preview Download |
|
md5:845818df33e16687e6e005d6ef8fbc8e
|
3.2 MB | Preview Download |
|
md5:12dda273f9f354e6d90f810233aa9a80
|
16.3 MB | Preview Download |
|
md5:66fc0a1669f93792278ab2179f7cb133
|
4.9 MB | Preview Download |
|
md5:6282a26f717b7b82a4f242d4b36ffc05
|
4.1 MB | Preview Download |
|
md5:4bb1084ea438ffb5b0446dffa8d2d5c5
|
16.2 MB | Preview Download |
|
md5:412fd53719b0300c0b492bf7146db6bc
|
3.8 MB | Preview Download |
Additional details
Related works
- Is derived from
- Dataset: 10.5281/zenodo.10005275 (DOI)