Published March 10, 2026 | Version v1
Presentation Open

Automation of content indexing with machine learning methods and Annif – proven models in productive operation and initial experiments with transformer models

  • 1. ZBW - Leibniz-Informationszentrum Wirtschaft Standort Hamburg
  • 2. ZBW Leibniz Information Centre for Economics

Description

In 2020, the ZBW launched a pilot phase to develop an AI-based content discovery service (‘AutoSE service’) based on machine learning methods that we develop or adapt in-house as part of our applied computer science research. The service has been in operation since 2021 and is continuously being optimised and expanded. The service is based on the open-source toolkit Annif, supplemented by a number of additional components for storing the results of our automated content discovery and for interface communication with other metadata systems in-house. The presentation covers various milestones and challenges that arose during the development of a productive service based on an open-source project such as Annif. It also discusses experiments with a more recent generation of language models based on transformer architectures to test what they can do for automated content indexing, and the insights gained from the initial results and considerations on how they could be integrated into AutoSE's productive operation.

Files

dbvKI2026AutoSE.pdf

Files (1.1 MB)

Name Size Download all
md5:9b2a27ab46a727cc3fb532f347da7979
1.1 MB Preview Download