Published July 12, 2024 | Version v1
Project deliverable Open

D4.10 – Big data platform and knowledge management system II

Description

The iHelp integrated solution aims at providing personalised health monitoring and decision support based on artificial intelligence using datasets coming from a variety of different and heterogeneous sources that will be integrated into a common data model: the Holistic Health Records (HHRs). The integrated solutions consist of various technology building blocks that are related firstly with the data ingestion process that is responsible to capture data from external sources, transform them and store them to the Big Data Platform. Secondly with the data analytics layer that makes use of these data to feed their internal AI algorithms. Finally, with the platform level components that provide the runtime execution environment and the data management activities of the integrated solution. As a result, the last category of building blocks is central to the iHelp platform and interacts with all other components.


This deliverable reports the work that has been recently carried out under the scope of T4.4 – “Big Data Platform and Knowledge Management System”, which is responsible for the data management activities of the platform. The outcome of this task, the Big Data Platform of iHelp will be used from i) the data ingestion processes that store data and ii) the data analytics functions that read data. As a result, it firstly needs to allow for data ingestion in very high rates and at the same time, to enable data analytics over the operational data that are being ingested at the same time. Moreover, the Big Data Platform needs to be integrated with various popular processing frameworks that are being used by the iHelp or the analytical functions, like Apache Spark or Apache Kafka Therefore, it provides various means of data connectivity mechanisms. Both the runtime execution environment and the data ingestion pipelines make use of intermediate Kafka queues, while Apache Spark is the popular analytical processing framework used by many developers of analytical tools.


Another important requirement for the Big Data Platform is its ability to combine and aggregate data coming from different data sources. An important requirement however is the need for the data not to be moved from outside the organisation they belong to, due to EU and country level regulations. The Big Data Platform provides support for polyglot query processing, which will be a key factor on the implementation of this requirement.


At this phase of the project, a prototype of the outcome of this task has been provided and made available to project partners. It includes the Big Data Platform itself along with the definition of the HHR relational schema, an enhanced Kafka Broker that is currently being used by the data ingestion pipelines It includes also the datastore connector, and a set of several microservices that have been developed during the second phase of the project. This deliverable includes a separate section that demonstrates the deployment, installation, and use of each of the aforementioned components, giving concrete examples with code snippets that can be used by all partners of the project as development and operational guidelines.


The T4.4 – “Big Data Platform and Knowledge Management System” has three development phases, and this deliverable reports the work that has been carried out until the second phase of the project (M22). Therefore, it also includes a section with next steps and a roadmap for the implementation and prototype delivery of the Big Data Platform until the end of the project. The last version of this report is planned to be delivered in M32.

Files

iHelp_D4.10-Big-data-platform-and-knowledge-management-system-II_v1.0.pdf