Presentation Open Access

Metadata-driven Scientific Use File data management

Bela, Daniel

With several excellent software tools emerging for the task, survey instrument creation in became more and more structured and automated in recent years. However, after field work has been done, it usually is up to one or several data managers in research institutions to process the data files and create ready-to-use analysis datasets and documentation. This data management process often is badly structured and documented, and seldom automated in social research. Many of the procedures that have to be run in order to create usable datasets, however, contain the potential for full- or semi-automation as soon as the procedures themselves are structured appropriately. In order to deal with the vast of incoming field data from the German National Educational Panel Study (NEPS), the data management team at LIfBi (in cooperation with partners across Germany) implemented such a structured and semi-automated approach for creating and updating the Scientific Use Files for the six panel cohorts of NEPS. This happened by conceptually separating several data management tasks from each other, and creating interface steps for interchanging data extracts (e.g. for coding text answers from the surveys or generating additional variables) with external partners. Additionally, every step of the data management process that could be automated by re-using information from the survey instruments or field documentation (e.g. renaming of variables, labeling, translation), has been designed to make use of this potential for automation. This led to a large amount of additional meta-information that now directly is integrated into NEPS Scientific Use File datasets, such as full questionnaire texts. Based on these experiences, a sketch of 'best practice' solutions to implement a metadata-driven data management workflow can be established. This presentation will focus on conceptual solutions to improve data management procedures in order to make them more structured, better documented, and less error-prone. Eventually, this approach can lead to better survey data for analyses, and reduce unsystematic variance in data management procedures---which otherwise necessarily constitute (in the best case) a large workload of fixing data afterwards or (in the worst case) biased research results.

Files (1.1 MB)
Name Size
558.5 kB Download
539.5 kB Download
All versions This version
Views 5151
Downloads 2323
Data volume 12.6 MB12.6 MB
Unique views 5151
Unique downloads 1717


Cite as