Presentation Open Access

Three perspectives on a collaborative attempt to use computer vision techniques to automatically classify historical newspaper images

Martijn Kleppe; Thomas Smits; Willem Jan Faber

Presentation given at the Workshop - Twin Talks: Understanding Collaboration in DH at DHN 2019, see also


In the last couple of years, scholars in the Humanities have started to explore the possibilities of the large-scale analysis of images. This development can be linked to the increasing availability of large visual datasets, the increase in computing power, and the development of new techniques, such as convolutional neural networks. However, there are no one-size-fits all researchers that are able to gather the right data, apply the new techniques, and analyze the results in meaningful ways. In this paper we present the collaboration of a Humanities researcher, a Research Software Engineer and Digital Scholarship Advisor to explore how new computer vision techniques can be used to automatically classify images extracted from a large collection of digitized historical newspapers. We will present the outcomes of our research and share the lessons we learned from our collaboration. First we will discuss the experiences of the Humanities researcher. Second we will discuss the lessons we learned from a technical perspective. Third, we will elaborate on the institutional perspective of the National Library of the Netherlands (KB) as a data provider but also as full partner of the research project. We will end with a reflection on the broader strategic role of heritage institutes as research partners to stimulate, collaborate and to preserve results of research projects in a sustainable manner.

Files (47.3 MB)
All versions This version
Views 273273
Downloads 9292
Data volume 4.3 GB4.3 GB
Unique views 255255
Unique downloads 8181


Cite as