Workshop: Automatically extract text, layout and metadata information from XML-files of OCR-ed historical texts. [abstract]

doi:10.5281/zenodo.6574964

DHBenelux 2022

Published May 23, 2022 | Version v1

Lesson Open

Workshop: Automatically extract text, layout and metadata information from XML-files of OCR-ed historical texts. [abstract]

Cuper, Mirjam¹

1. KB, the national library of the Netherlands

In the domain of Digital Humanities, researchers are often interested in analyzing large amounts of historical texts. These texts are often stored by digital heritage institutions in a variety of XML formats. Being able to process XML documents is an invaluable skill for DH researchers. In this workshop participants will learn how to use Python to extract information and data from various common XML formats

Files

Abstract_workshop_v2.pdf

Files (94.4 kB)

Name	Size	Download all
Abstract_workshop_v2.pdf md5:4c6664d85f0fbc9cdb32feb19b550eee	94.4 kB	Preview Download

Views

Downloads

Show more details

	All versions	This version
Views	89	87
Downloads	65	64
Data volume	6.3 MB	6.2 MB

More info on how stats are collected....

DOI

Resource type

Lesson

Publisher

Zenodo

Conference

DH Benelux 2022 - ReMIX: Creation and alteration in DH (hybrid)

Languages

English

Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more

Technical metadata

Created: May 23, 2022
Modified: May 24, 2022

Workshop: Automatically extract text, layout and metadata information from XML-files of OCR-ed historical texts. [abstract]

Creators

Description

Files

Abstract_workshop_v2.pdf

Files (94.4 kB)