Published May 23, 2022 | Version v1
Lesson Open

Workshop: Automatically extract text, layout and metadata information from XML-files of OCR-ed historical texts. [abstract]

  • 1. KB, the national library of the Netherlands

Description

In the domain of Digital Humanities, researchers are often interested in analyzing large amounts of historical texts. These texts are often stored by digital heritage institutions in a variety of XML formats. Being able to process XML documents is an invaluable skill for DH researchers. In this workshop participants will learn how to use Python to extract information and data from various common XML formats

Files

Abstract_workshop_v2.pdf

Files (94.4 kB)

Name Size Download all
md5:4c6664d85f0fbc9cdb32feb19b550eee
94.4 kB Preview Download