Published October 29, 2021 | Version 2.0
Project deliverable Open

iHelp: Primary data capture and ingestion I


The iHelp integrated solution aims at providing personalised health monitoring and decision support based on artificial intelligence using datasets coming from a variety of different and heterogeneous sources that will be integrated into a common data model: the holistic health records (HHR). As such, the software components that make up the integrated platform of iHelp can be categorized in three major layers: (1) those components that are involved with the business intelligence – using artificial intelligence algorithms that consume data from the data management layer, (2) the building blocks that make up the data management layer itself and provide the runtime environment for these analytics to be executed along with central data repository, and (3) the building blocks that are responsible for capturing data from the external sources, and eventually ingest it to the central data repository after applying various functions for data quality assurance during the data ingestion process.
Regarding the third category of software components, which is the one related with the ingestion of data, there can be identified the need to ingest both primary and secondary data. The distinction between those two is that primary data concerns static information of clinical data, while the secondary data accumulates information that has been initially captured during the runtime and is available to the iHelp platform after a pre-processing. This deliverable reports on the work that has been carried out for the activities related with Task 3.2: “Primary Data Capture and Ingestion”. These activities concern the capture of data from the external sources (either primary data that will need to be captured directly from an external source, or secondary data that will be captured by the output of the corresponding activities of the Task 3.3: “Secondary Data Extraction and Interoperability”. The capture of these two types of data contributes towards the establishment of the data ingestion pipelines so that the data can be eventually stored into the big data platform, converted to the common data model provided by the activities of the Task 3.1: “Data Modelling and Integrated Health Records”. Other functions implemented as part of the data ingestion pipeline are related with the data quality assurance that are researched and realized in the scope of Task 3.4: “Standardisation and Quality Assurance of Heterogeneous Data”, and reported in D3.7: “Standardisation and Quality Assurance of Heterogenous Data I”.
This deliverable reports on the work carried out under Task 3.2: “Primary Data Capture and Ingestion”, and as such, it will only focus on its objectives. We start by providing the overview of data pipelines, and then it will drill down to the details of the design and implementation of its three main aspects that they concern. Firstly, the software components related with the data capture, secondly on the requirements that the involved data ingestion functions must comply with, and finally, how the deployment and establishment of such data pipelines will take place.
As this is the first version of this report, at this phase of the project the initial design of the involved components will be provided, along with different design approaches for the overall integration of the components that will be involved in the data capture and ingestion pipelines. Detailed discussions about the benefits and drawbacks of each of the proposed designs will be given, along with the requirements that each design brings along. The second version of this document will include the decisions about which approach will be followed, along with the final design of the involved components towards their updated implementation.



Files (818.3 kB)