Published April 30, 2020 | Version v1
Project deliverable Open

D4.2 Refined Architecture, Initial System Prototype and Design of Software Stacks

Description

This deliverable describes the prototype implementation of the INFORE Architecture, which is defined in the deliverable D4.1 (Month 12) of the INFORE project. The following sections detail how the main objectives for the INFORE Architecture are achieved by initial implementations and prototypes of the components. The INFORE Architecture aims for a holistic, pluggable, extensible framework which supports the following objectives: 

  1. i. supporting the non-programmer data analyst in rapid setup of streaming workflows tailored for her application scenario needs by providing graphical workflow design facilities, 
  2. ii. automating the tuning of the underlying Big Data platform infrastructure that materializes the visually designed workflow as well as the provisioned physical resources in a way that optimizes specific performance metrics, 
  3. iii. providing real-time, interactive machine learning and data mining tools that can be leveraged by the designed workflows, 
  4. iv. enhanced interactivity via data summarization and approximate query processing techniques, 
  5. v. distributed complex event processing and forecasting techniques to not only detect business events of interest as soon as they occur, but also forecast their occurrence well in advance. 

 

These goals are achieved by implementing the following components: 

  • • Graphical Editor Component 
  • • Connection Component 
  • • Manager Component 
  • • Optimizer Component 
  • • Synopsis Data Engine Component 
  • • Interactive Online Machine Learning Component 
  • • Complex Event Forecasting Component 

 

The functionality described in deliverable D4.1 is achieved by the respective implementations of the components and the interactions between them. 

The RapidMiner Studio software is extended by the streaming extension. The extension provides the implemented Graphical Editor Component, the Connection Component and the Manager Component. 

In addition, it provides the integration with the Optimizer Component, the Synopsis Data Engine Component and the Interactive Online Machine Learning Component. The implementation of the integration of the Complex Event Forecasting Component is ongoing and will be reported in Deliverable D4.3 (Month 32). 

Due to these implementations, the streaming extension enables users of the INFORE Architecture to design streaming analysis workflows via logical streaming operations without the need for programming or scripting skills. The designed workflow abstracts away the streaming logic from actual, physical implementations; there is no need for the user to concern himself with such details. The streaming extension of RapidMiner provides the Streaming Optimization operator which utilizes the capabilities of the Optimizer Component to perform an optimization of the designed workflow. The resulting physical workflow is then visualized in RapidMiner Studio and can be dispatched for execution by using the new Streaming Nest operator. This operator implements the functionality of the Manager Component by being able to create streaming jobs specific to streaming platforms and submitting those to the respective clusters. 

The streaming extension also provides the new operators Synopsis Data Engine, and Online Machine Learning. They make use of the interfaces, which are described in the deliverable D4.1, to integrate functionality provided by above-mentioned components in the streaming analysis workflow. The components can be configured through the graphical user interface of RapidMiner Studio and be leveraged in the designed workflow.

This deliverable focuses on the integration of the various components that together constitute the initial prototype of the entire INFORE Architecture. The implementations and inner workings are detailed for the Optimizer Component in deliverable D5.1 (Month 16, together with this deliverable), the Synopsis Data Engine Component in deliverables D6.1 (Month 12) and D6.3 (Month 24), the Interactive Online Machine Learning Component and the Complex Event Forecasting Component in deliverables D6.2 (Month 16, together with this deliverable), D6.4 and D6.5.

Files

D4.2 Refined Architecture, Initial System Prototype and Design of Software Stacks.pdf

Additional details

Funding

European Commission
INFORE – Interactive Extreme-Scale Analytics and Forecasting 825070