Integrating Long-Term Access into DataPLANT Data Management Workflows
Providing access to research data while upholding the FAIR principles presents unique challenges. To facilitate reuse, it is not enough to just archive the research data itself but accompanying research software, e.g., computational workflows, must be described using metadata and archived as well, for which DataPLANT developed the Annotated Research Context (ARC) specification. This work extends this concept for long-term access to also detect, describe, and preserve implicit software dependencies, e.g., Docker images or host operating system kernels, which otherwise might be no longer available at time of future reuse. It further addresses implicit long-term hardware dependencies using emulation in a generic and descriptive way. It shows that preservation of research software must be an integral part already of development workflows and not an afterthought and shows how existing work can be integrated into DataPLANT's research data management system to turn an archived ARC into a long-term re-executable ARC.