Project deliverable Open Access
Project consortium members
Big Data processing workflows typically span a multitude of execution and storage platforms. Parts of the processing could be pushed to the input sensor level, as in the case of the wavegliders in the Maritime use case, while other more computationally intensive parts/operators (such as stock correlation functions in the Financial use case, or gene simulations in Life Sciences use case) could be executed either within one or more (potentially distributed) Big Data platforms or within other clusters (i.e., GPUs) of a supercomputer. Even within a single (i.e., BSC’s MareNostrum 4) supercomputer one often finds different available clusters, with different hardware and processing capabilities, which could process a given workflow. Hence, the space of potential plans (a.k.a. physical execution plans) to process a Big Data workflow could be vast. Finding in a timely fashion the right plan that is both efficient and cost effective is not trivial.
This deliverable presents techniques for optimizing workflow execution in terms of a set of optimization objectives (e.g., throughput, resource utilization) of extreme-scale analytics across different, potentially geo-dispersed computer clusters each hosting one or more Big Data platforms.
WP5 interacts with WP4 since the Optimizer Component is a fundamental component of the overall INFORE architecture. WP5 receives a logical workflow as JSON formatted input from the Graphical Editor Component of the architecture via the Manager Component. It ingests statistics collected by the Manager Component to perform cost estimations and judge the performance of alternative execution plans i.e., the Optimizer Component transforms the logical workflow to a physical one to be deployed in the available computer clusters and Big Data platforms. Having performed this mapping, it returns it to the Manager Component to visualize it to the Graphical Editor Component of the INFORE architecture and deploy it to the available computer clusters. Moreover, WP5 interacts with the Synopses Data Engine Component and the Machine Learning and Data Mining Component of WP6 which provide the physical implementations of respective logical operators drawn in the Graphical Editor Component during code-free workflow specification. Finally, WP5 optimizes the logical workflows satisfying the application needs of the Biological (WP1), Financial (WP2) and Maritime (WP3) use cases.
D5.1 Operator Cost Estimation and Workflow Optimization Technology V1.pdf