D3.4 – Primary data capture and ingestion II
Description
The iHelp integrated solution aims at providing personalised health monitoring and decision support based on artificial intelligence using datasets coming from a variety of different and heterogeneous sources that will be integrated into a common data model: the Holistic Health Records (HHRs). As such, the software components that make up the integrated platform of iHelp can be categorized in three major layers: (1) those components that are involved with the business intelligence – using artificial intelligence algorithms that consume data from the data management layer, (2) the building blocks that make up the data management layer itself and provide the runtime environment for these analytics to be executed along with central data repository, and (3) the building blocks that are responsible for capturing data from the external sources, and eventually ingest it to the central data repository after applying various functions for data quality assurance during the data ingestion process.
Regarding the third category of software components, which is the one related with the ingestion of data, there can be identified the need to ingest both primary and secondary data. The distinction between those two is that primary data concerns static information of clinical data, while the secondary data accumulates information that has been initially captured during the runtime and is available to the iHelp platform after a pre-processing. This deliverable reports on the work that has been carried out for the activities related with T3.2 – “Primary Data Capture and Ingestion”. These activities concern the capture of data from the external sources: either primary data that will need to be captured directly from an external source, or secondary data that will be captured by the output of the corresponding activities of the T3.3 - “Secondary Data Extraction and Interoperability”. The capture of these two types of data contributes towards the establishment of the data ingestion pipelines so that the data can be eventually stored into the big data platform, converted to the common data model provided by the activities of the T3.1 - “Data Modelling and Integrated Health Records”. Other functions implemented as part of the data ingestion pipeline are related with the data quality assurance that will be provided by the activities of the T3.4 - “Standardisation and Quality Assurance of Heterogeneous Data”.
This deliverable reports on the work carried out under T3.2 - “Primary Data Capture and Ingestion”, and as such, it will only focus on its objectives. We start with providing the overview of data pipelines, and then it will drill down to the details of the design and implementation of its three main aspects that they concern. Firstly, the software components related with the data capture, secondly on the requirements that the involved data ingestion functions must comply with, and finally, how the deployment and establishment of such data pipelines will take place.
This is the second version of this report and additionally includes more details about the implementation of each of the aforementioned components and how we extended their initial specification. We have included a new section describing our novel data logging mechanism taking into account the discussions with of our reviewers after the demonstrator of the use of the integrated data ingestion pipelines. This second version of this document also includes the decisions that have been made about different design approaches that were identified in the first phase of the project. Finally, we have included an additional section that includes the demonstrator of the use of our data ingestion pipelines, providing concrete examples with code snippets that can be used as recipes to allow the deployment of such pipelines in different environments, highlighting the portability of our solution.
Files
iHelp_D3.4-Primary-data-capture-and-ingestion-II_v1.0.pdf
Files
(1.5 MB)
Name | Size | Download all |
---|---|---|
md5:5fb771b1b372e6a633c3d5b34eadcb70
|
1.5 MB | Preview Download |