Published June 26, 2015 | Version v1
Presentation Open

12.2 Open and Reproducible Analytical Workflows in the Humanities: The Case of Finnish Bibliographic Metadata

Description

The principles of open science are often applauded, but in the humanities rarely implemented in practice. This will change with concrete proof of the usefulness of shared open research data and methods in collaborative projects (including libraries, researchers, data management, faculty and students). Our idea is to establish and study Reproducible Analytical Workflows (RAW) by constructing open source environments that combine data and information resources, statistical analysis and automated reports. To introduce the RAW model to the humanities in Finland, we are launching a pilot project at the University of Helsinki based on our earlier experiments with early modern British and American metadata investigating the ESTC within a reproducible workflow to provide transparent quantitative analysis of knowledge production: https://github. com/rOpenGov/estc/blob/master/inst/examples/summary.md

This paper considers similar tools constructed to analyse the Finnish National Bibliography (1488–1917) based on the same quantitative and open source research tools that we developed in the context of the British Library Data Collections. Our aim is to make these methods widely known and accessible, reaching university education already at an undergraduate level. Integrating the services of data management infrastructure is an important component of the automated workflow. FIN-CLARIN, hosting the language bank of Finland, offers a repository for the bibliographic data and a search engine, which enables the students and teachers to select and download data samples using a familiar search environment. The data can be further processed with standard analytical tools and compared with visualizations of the original data. Researchers and more advanced students may venture into modifying the visualizations for their own purposes as reproducible statistical workflows.

The University Library information literacy education will further integrate the RAW model in the curriculum. In order to introduce students to open data methods, teaching will start with a simple interface and samples, and aim at real research use of the data and creating an understanding of data, methods, collaboration and reporting in the open science process so that the whole becomes constructively aligned. A problem-driven approach to teaching and supporting the work with teamwork and social tools will enhance the learning outcomes. Open science workflow involves the library at every step of the way.

Professor Mikko Tolonen has a background in intellectual history. His monograph, Mandeville and Hume: Anatomists of Civil Society, published in Oxford University Studies in the Enlightenment in 2013 combined the study of history of philosophy and book history. According to some reviews, it was a pioneering approach. Currently he continues his efforts to use book history to answer research questions in the tradition of intellectual history in a multidisciplinary project with a data scientist Dr Leo Lahti. Previously he has collaborated widely internationally, for example, publishing with Noel Malcolm about Thomas Hobbes’s correspondence based on the use of auction sale catalogues.

Files

Files (1.4 MB)