Presentation Open Access

Data processing of ILS data to facilitate a new discovery layer for the German Literature Archive (DLA)

Thomas Meyer; Felix Lohmeier

Kallías, the OPAC of the German Literature Archive in Marbach, is used by scholars worldwide as an information system and for access to literary sources. It provides five entry points to the collections: Manuscripts, library objects, images and objects, holdings and names; thus representing the high-quality cataloging in different divisions of the institution. Since 2017 a new discovery layer has been developed to integrate all sources into a cross-media, tailor-made online catalog. Although using a classic Solr based (non linked data) approach the new catalog makes productive use of authority data and relationships between works and special collections.

The new catalog is still in closed beta and is going to be released at the end of 2019. The presentation will focus on the custom data processing pipeline which is based on the Open Source tools Pandas (a Python library) and OpenRefine. 4 Million records are extracted from the local ILS, transformed into a tabular format, manipulated with custom rulesets, enriched with external data sources and loaded into a Solr index every day. The pipeline is orchestrated with simple Bash shell scripts that makes it easy to extend the workflow with other command line tools. By making legacy ILS data available in OpenRefine, library staff is enabled to use their data in other contexts (e.g. for digitization projects) and to publish their data in different formats (e.g. EAD-XML for the Kalliope union catalog).

Files (14.3 MB)
Name Size
14.3 MB Download
All versions This version
Views 488489
Downloads 234234
Data volume 3.3 GB3.3 GB
Unique views 444445
Unique downloads 210210


Cite as