Presentation Open Access
Barend Mons states in his 2020 Nature article that around 5% of the overall research budget should go towards research data management (RDM) activities. However, such lump sums ignore differences across research projects and may question higher requested RDM budgets whenever projects are more complex. Research on RDM costs is still in its infant phase and precise statements about which RDM activities are eligible for funding are rare. Clear and more precise recommendations on how to budget RDM activities in research projects will help to raise awareness of the importance of RDM, devote appropriate time and effort towards RDM tasks, and help researchers to accommodate for these tasks when applying for research funding. They will consequently improve data sharing and reuse.
However, budgeting RDM activities is difficult and complex and depends on the type of data, their volume, and information included. In this paper, we aim to investigate the costs of RDM activities, more specifically of data cleaning and documentation. We make use of a pilot study conducted at the GESIS Data Archive for the Social Sciences in Germany between December 2016 and September 2017. During this period, data curators at the GESIS Data Archive documented their working hours while cleaning and documenting data from ten quantitative survey studies. We analyze this documentation and interview data curators to identify and examine important cost drivers in RDM, i.e., aspects that increase hours spent on these tasks, and factors that lead to a reduction of their work.