ISO 23494: Biotechnology – Provenance Information Model for Biological Specimen And Data

. Exchange of research data and samples in biomedical research has become a common phenomenon, demanding for their eﬀec-tive quality assessment. At the same time, several reports address re-producibility of research, where history of biological samples (acquisition, processing, transportation, storage, and retrieval) and data history (data generation and processing) deﬁne their ﬁtness for purpose, and hence their quality. This project aims to develop a comprehensive W3C PROV based provenance information standard intended for the biomedical research domain. The standard is being developed by the working group 5 (”data processing and integration”) of the ISO (International Standardisation Organisation) technical committee 276 “biotechnology”. The outcome of the project will be published in parts as international standards or technical speciﬁcations. The poster informs about the goals of the standardisation activity, presents the proposed structure of the standards, brieﬂy describes its current state and outlines its future development and open issues.

Abstract.Exchange of research data and samples in biomedical research has become a common phenomenon, demanding for their effective quality assessment.At the same time, several reports address reproducibility of research, where history of biological samples (acquisition, processing, transportation, storage, and retrieval) and data history (data generation and processing) define their fitness for purpose, and hence their quality.This project aims to develop a comprehensive W3C PROV based provenance information standard intended for the biomedical research domain.The standard is being developed by the working group 5 ("data processing and integration") of the ISO (International Standardisation Organisation) technical committee 276 "biotechnology".The outcome of the project will be published in parts as international standards or technical specifications.The poster informs about the goals of the standardisation activity, presents the proposed structure of the standards, briefly describes its current state and outlines its future development and open issues.

Introduction
Research in life sciences has undergone significant changes during recent years, evolving away from individual projects confined to small research groups to transnational consortia covering a wide range of techniques and expertise.At the same time, several reports addressing the quality of research papers in life sciences have uncovered an alarming number of ill-founded claims.The reasons for the deficiencies are diverse, with insufficient quality and documentation of the biological material used being the major issue [1][2][3].Hence there is urgent need for standardized and comprehensive documentation of the whole workflow from the collection, generation, processing and analysis of the biological material to data analysis and integration.
The PROV [4] family of documents serves as a current standard for provenance information used to describe the history of an object.On the other hand, as discussed in the results from EHR4CR and TRANSFoRm projects [5,6], its implementation for the biotechnology domain and the field of biomedical research in particular is still a pending issue.To address this, the International Standardisation Organisation (ISO) initiated the development of a Provenance Information Model for Biological Specimen and Data standard defining the requirements for interoperable, machine-actionable documentation intended to describe the complete process chain from the source of biological material through its processing, analysis, and all steps of data generation and data processing to final data analysis.
The standard is intended for implementers and suppliers of HW/SW tools used in biomedical research (e.g.lab automation devices or analytical devices used for research purposes) and also for organisations adopting generated provenance (e.g. to require or use standardised tools).

Goals of the Standard and Its Structure
The main goals of the standard are to (a) enable effective assessment of quality and fitness for purpose of the objects provided, such as biological material and data; (b) support reproducible research by exacting the capture of all relevant information; (c) track error propagation within scientific results; (d) track the source of biological material in order to prevent fabrication of data and enabling notification of subjects in case of relevant incidental findings; (e) propagate withdrawal of or changes to an informed consent along the process chain.
The proposed structure of the standard reflects the intention to interconnect and integrate distributed provenance information furnished by all kinds of organisations involved in biotechnology research.Examples of such organisations are hospitals, biobanks, research centers, universities, data centers or pharmaceutical companies, where each of them is participating in research, thus generating provenance information describing particular activities or contributions.
In its current the standard is composed of the following 6 parts: -Part 1 stipulates common requirements for provenance information management in biotechnology to effectuate compatibility of provenance management at all stages of research and defines the design concept of this standard; -Part 2 defines a common provenance model which will serve as an overarching principle interconnecting provenance parts generated by all kinds of contributing organisations and enable access to provenance information in a distributed environment; -Parts 3, 4 and 5 are meant to complement the horizontal standards ( 1) and ( 2) as vertical standards defining domain specific provenance models describing diverse stages or areas of research in biotechnology (e.g.sample acquisition and handling, analytical techniques, data management, cleansing and processing; database validation); -Part 6 will contain optional data security extensions especially to address non-repudiation of provenance.
The proposed structure is also depicted in figure (1).Parts indicated by red boxes are considered as horizontal standards, i.e. providing a common basis for provenance information at all stages of research.The blue boxes indicate domain specific vertical standards build on top of the horizontal standards.The standard is currently at an early stage of development.The PROV model has been already used to define new types of provenance structures, called connectors, that are used to interconnect provenance generated by different organizations.The concept of the connectors and a common mechanism for bundles versioning has been published as an EOSC-Life project provenance deliverable [7].A publication describing use of the connectors at a specific use case is under development at the moment and its pre-print will be published in summer 2021.Continuously, the model will be enriched by new types of structures (e.g.relations, entities, etc.) to capture common objects.These structures will be subsequently used to design provenance templates1 to define a common representation of usual scenarios in life sciences.Further aspects will be also targeted.
The major focus areas are: opaque provenance components; privacy preservation and non-repudiation of provenance information; full syntactic and semantic interoperability of provenance information captured; rigorous formal verification process of provenance instance validity (provable compliance with the proposed model).
Another publication describing the standardization process in a more detailed way is under development.The publication will contain more detailed explanation of our motivation and the standardization activity itself, more detailed description of the standard structure, and finally, an important discussion on openness of the standard and related issues.

ISO 23494
The purpose of the standard is the standardization of provenance information for the biotechnology domain covering the whole process chain, from the source of biological material, through its processing, analysis, and all steps of data generation and processing. BIOTECHNOLOGY Keywords: provenance • biotechnology • standardization Supported by European Union's Horizon 2020 research and innovation programme under grant agreement No. 654248, project CORBEL; grant agreement No. 824087, project EOSC-Life; and grant agreement No. 823830, project BioExcel-2.