Published June 2023 | Version v1
Project deliverable Open

D2.1 Global Data Sharing Standard

  • 1. ROR icon Sirma Group (Bulgaria)
  • 2. ROR icon Maastricht University


Ontologies are increasingly used to support harmonisation of population data from heterogeneous data sources in support of clinical research, with a specific research question requiring a well defined dataset. AIDAVA is exploring the possibility of using an ontology to harmonise all patient data, extracted from heterogeneous data sources, into an individual personal health knowledge graph (PHKG) that can then be reused for multiple purposes, in clinical care and clinical research.

The decision to take an ontology approach in AIDAVA, rather than to follow a structural standard such as an information model, was made already at proposal time as ontologies are semantic rich and agnostic of structural and syntactical formats, increasing potentially of interoperability and reuse in compliance to the FAIR principles. Moreover, new knowledge can be added smoothly by extending the ontology concepts with RDF triples and data quality constraints through SHACL rules.

Development of the AIDAVA Reference Ontology followed a structured approach including ideation, requirement analysis, design and development. The requirements took into account the use cases developed in WP1, the requirements extracted from the automation phases described in Task 2.1 and the annotation process described in Task 4.3. The data quality constraints were built in alignment with Task 4.2. We identified 4 Ontology Strategic Requirements and 6 Ontology Requirement Specifications that provided directions for the design and the developement of the ontology.

A critical aspect of an ontology like the AIDAVA Reference Ontology to comply with FAIR principles as effectively as possible is to maximise alignment with emerging and existing standards. While reviewing the work on semantic interoperability of related initiatives, including TEHDAS and the European Electronic Health Record exchange format (EEHRxf), we came to the conclusion that SNOMED CT and LOINC were priority standards to be included. However they need to be completed by other standards to cover additional relationships and other domains. Several candidates were considered and it was decided to include the semantics subsumed in the HL7 FHIR General Purpose Data Types, and relevant HL7 FHIR profiles through the governance process, as second priority. We expect that other semantic standards will be required to achieve the long term objective of the AIDAVA Reference Ontology to cover a majority of medical concepts contained in personal health medical records.

This deliverable also describes the technical specification of the AIDAVA Reference Ontology, which defines the structure, components, and relationships within the scope of the two targeted use cases (Breast cancer registry and Cardiovascular score) and in a broader context (ensuring semantic interoperability across systems). It includes a formal representation of the concepts, entities and their attributes, which are specified in the AIDAVA Dataset.

While developing the ontology, we realised that additional concepts and relationships as well data quality constraints will need to be added when data sources to be curated will be onboarded across sites, and when more narrative texts will be annotated. This requires a governance process to be executed during the project lifetime, as described in Section 3.4. In addition, and assuming the project will be successful, governance will also be needed beyond the project to maximise sustainability and reuse of the results. While is not in scope of this deliverable, the proposed approach is introduced here; it will be discussed extensively during the planned meetings with the Sustainability Advisory Board.


AIDAVA_101057062_D2.1_Global Data Sharing Standard_final_zenodo.pdf

Files (2.0 MB)

Additional details


AIDAVA - AI powered Data Curation & Publishing Virtual Assistant 101057062
European Commission