Published December 31, 2021 | Version 1.0
Project deliverable Open

iHelp: Big data platform and knowledge management system I


The iHelp integrated solution aims at providing personalised health monitoring and decision support based on artificial intelligence using datasets coming from a variety of different and heterogeneous sources that will be integrated into a common data model: the holistic health records (HHR). The integrated solutions consist of various technology building blocks that are related firstly with the data ingestion process that is responsible to capture data from external sources, transform them and eventually stored them to the Big Data Platform. Secondly with the data analytics layer that makes use of these data to feed their internal AI algorithms. Finally, with the platform level components that provide the runtime execution environment and the data management activities of the integrated solution. As a result, the last category of building blocks is central to the iHelp platform and interacts with all other components.
This deliverable reports the work that has been currently carried out under the scope of T4.4 (“Big Data Platform and Knowledge Management System”), which is responsible for the data management activities of the platform. The outcome of this task, the Big Data Platform of iHelp will be used from i) the data ingestion processes that store data and ii) the data analytics functions that read data. As a result, it firstly needs to allow for data ingestion in very high rates and at the same time, to enable data analytics over the operational data that are being ingested at the same time. Moreover, the Big Data Platform needs to be integrated with various popular processing frameworks that are being used by the iHelp or the analytical functions, like Apache Spark or Apache Kafka, and therefore, it provides various means of data connectivity mechanisms. Both the runtime execution environment and the data ingestion pipelines make use of intermediate Kafka queues, while Apache Spark is the popular analytical processing framework used by many developers of analytical tools.
Another important requirement for the Big Data Platform is its ability to combine and aggregate data coming from different data sources. An important requirement however is the need for the data not to be moved from outside the organisation they belong to, due to EU and country level regulations. In fact, data most likely cannot be moved not event outside of the organisation, as clinical data are considered very sensitive. The Big Data Platform provides support for polyglot query processing, which will be a key factor on the implementation of this requirement.
At this phase of the project, a first prototype of the outcome of this task has been provided and is available to the project. This report includes a separate section that demonstrates its deployment, installation, and use, giving concrete examples with code snippets that can be used by all partners of the project as development and operational guidelines.
Task T4.4 (“Big Data Platform and Knowledge Management System”) has three development cycles and this deliverable reports the work that has been carried out until M12. Therefore, it also includes a section with next steps and a roadmap for the implementation and prototype delivery of the Big Data Platform for the next period. The second version is planned to be delivered in M24.