Automating subject indexing at ZBW – the costs of the digital transformation and why we need less projects
- 1. ZBW – Leibniz Information Centre for Economics
Subject indexing, i.e., the enrichment of metadata records for textual resources with descriptors from a controlled vocabulary, is one of the core activities of libraries. Due to the proliferation of digital documents it is no longer possible to annotate every single document intellectually, which is why we need to explore the potentials of automation on every level.
At ZBW the efforts to partially or completely automate the subject indexing process have started as early as 2000 with experiments involving external partners and commercial software. In 2014 the decision was made to start doing the necessary applied research in-house which was successfully implemented by establishing a PhD position. However, the prototypical machine learning solutions that they developed over the following years were yet to be integrated into productive operations at the library. Therefore in 2020 an additional position for a software engineer was established and a pilot phase was initiated (planned to last until 2024) with the goal to complete the transfer of our solutions into practice by building a suitable software architecture that allows for real-time subject indexing with our trained models and the integration thereof into the other metadata workflows at ZBW.
In this talk we report on the milestones we have reached so far and on those that are yet to be reached on an operative level. We also discuss the challenges we were facing on a strategic level, the measures and resources (hardware, software, personnel) that were needed in order to be able to effect the transfer, and those that will be necessary in order to subsequently ensure the continued availability of the architecture and to enable a continuous development during running operations.
We argue that in general, the format of “project” and the mindset that goes with it may not suffice to secure the commitment that an institution and its decision-makers and the library community as a whole will have to bring to the table in order to face the monumental task of the digital transformation and automatization in the long run.