Lesson Open Access

Workshop: Automatically extract text, layout and metadata information from XML-files of OCR-ed historical texts. [abstract]

Cuper, Mirjam

In the domain of Digital Humanities, researchers are often interested in analyzing large amounts of historical texts. These texts are often stored by digital heritage institutions in a variety of XML formats. Being able to process XML documents is an invaluable skill for DH researchers. In this workshop participants will learn how to use Python to extract information and data from various common XML formats

Files (94.4 kB)
Name Size
Abstract_workshop_v2.pdf
md5:4c6664d85f0fbc9cdb32feb19b550eee
94.4 kB Download
30
26
views
downloads
All versions This version
Views 3030
Downloads 2626
Data volume 2.5 MB2.5 MB
Unique views 2828
Unique downloads 2626

Share

Cite as