Presentation Open Access

Data processing of ILS data to facilitate a new discovery layer for the German Literature Archive (DLA)

Thomas Meyer; Felix Lohmeier

Kallías, the OPAC of the German Literature Archive in Marbach, is used by scholars worldwide as an information system and for access to literary sources. It provides five entry points to the collections: Manuscripts, library objects, images and objects, holdings and names; thus representing the high-quality cataloging in different divisions of the institution. Since 2017 a new discovery layer has been developed to integrate all sources into a cross-media, tailor-made online catalog. Although using a classic Solr based (non linked data) approach the new catalog makes productive use of authority data and relationships between works and special collections.

The new catalog is still in closed beta and is going to be released at the end of 2019. The presentation will focus on the custom data processing pipeline which is based on the Open Source tools Pandas (a Python library) and OpenRefine. 4 Million records are extracted from the local ILS, transformed into a tabular format, manipulated with custom rulesets, enriched with external data sources and loaded into a Solr index every day. The pipeline is orchestrated with simple Bash shell scripts that makes it easy to extend the workflow with other command line tools. By making legacy ILS data available in OpenRefine, library staff is enabled to use their data in other contexts (e.g. for digitization projects) and to publish their data in different formats (e.g. EAD-XML for the Kalliope union catalog).

Files (14.3 MB)
Name Size
ELAG2019_Meyer-Lohmeier_2019-05-09.pdf
md5:905d38b174a5aa0c2eb59ae579124a52
14.3 MB Download
211
68
views
downloads
All versions This version
Views 211211
Downloads 6868
Data volume 970.4 MB970.4 MB
Unique views 183183
Unique downloads 6161

Share

Cite as