BY-COVID D4.3 Provenance model for infectious diseases
Description
The exchange of research objects, such as specimens and research data, is common for modern research. However, several reports indicated issues with reproducibility and quality of the exchanged research objects. Poor documentation quality of processes related to specimen and data generation are stated as one of the most common causes. Additionally, significant impact of flawed research results on economics, health, and political decisions is stated, as we could also witness during the COVID-19 pandemic. In order to prevent these issues, professional societies and research initiatives call for improved and standardised documentation of the data and specimens used in research studies.
Provenance is information about the history of a documented object – be it physical or digital. In particular, provenance captures information about entities, activities, or agents that were involved in or affected the creation of the object. Depending on particular information included in provenance, it can be used for various purposes, including traceability of precursors of documented objects. For instance, given a result of a data analysis, provenance can be used to trace the input data used for the analysis, the original data sources, the methods used to generate the data, or the sources for data generation, such as biological material from which the data was generated (e.g., omics data, image data). This way, provenance information can heavily contribute to the reproducibility of the results, ability to assess their quality or fitness-for-purpose, auditability, or other purposes.
This deliverable describes a general provenance information model for infectious diseases. The underlying provenance model is designed to support distributed multi-organizational provenance information, which is generated when a documented object’s life cycle spans multiple organisations, such as a sample acquired in a hospital, data generated from the sample in a laboratory, and the data processed and analysed by a research group by a university or a private company. The provenance model supports traceability of precursors of a documented object (e.g., a dataset) by providing the data derivation chain back to original sources of the data, such as biological samples from which data was generated. The model supports FAIRness (especially Reusability) of a documented object by providing standardised means to attach to or link relevant information from provenance. The provenance model enables parts of provenance traces to be kept confidential or anonymised to preserve confidentiality and privacy of respective personal data subjects, such as patients, donors, or responsible people.
The deliverable also describes a set of RO-Crate [Soiland-Reyes 2022a] profiles to capture the provenance of the execution of computational workflows, with different levels of granularity. The RO-Crate format enables the description and packaging of the available artefacts and their metadata into a lightweight de-facto-standardised format.
Files
BY-COVID_D4.3_Provenance_model_for_infectious_diseases.pdf
Files
(2.8 MB)
Name | Size | Download all |
---|---|---|
md5:c5666299f2c7681205dc05bde57db896
|
2.8 MB | Preview Download |