Published May 23, 2022 | Version v1
Lesson Open

Workshop: Automatically extract text, layout and metadata information from XML-files of OCR-ed historical texts. [abstract]

  • 1. KB, the national library of the Netherlands


In the domain of Digital Humanities, researchers are often interested in analyzing large amounts of historical texts. These texts are often stored by digital heritage institutions in a variety of XML formats. Being able to process XML documents is an invaluable skill for DH researchers. In this workshop participants will learn how to use Python to extract information and data from various common XML formats



Files (94.4 kB)

Name Size Download all
94.4 kB Preview Download