Published June 16, 2016 | Version v1
Conference paper Open

A Comparison of Big Data Frameworks on a Layered Dataflow Model

  • 1. Computer Science Department, University of Torino. Torino, Italy
  • 2. D ́ept. d'Informatique, Universit ́e du Qu ́ebec `a Montr ́eal. Montr ́eal, QC, Canada

Description

In the world of Big Data analytics, there is a series of tools aiming at simplifying programming applications to be executed on clusters. Although each tool claims to provide better programming, data and execution models, for which only informal (and often confusing) semantics is generally provided, all share a common underlying model, namely, the Dataflow model. The Dataflow model we propose shows how various tools share the same expressiveness at different levels of abstraction. The contribution of this work is twofold: first, we show that the proposed model is (at least) as general as existing batch and streaming frameworks (e.g., Spark, Flink, Storm), thus making it easier to understand high-level data-processing applications written in such frameworks. Second, we provide a layered model that can represent tools and applications following the Dataflow paradigm and we show how the analyzed tools fit in each level.

Notes

https://arxiv.org/abs/1606.05293

Files

1606.05293v1.pdf

Files (571.6 kB)

Name Size Download all
md5:5e296d1cbd903633ff94917f0536ab89
571.6 kB Preview Download

Additional details

Related works

Is previous version of
10.1142/S0129626417400035 (DOI)

Funding

TOREADOR – TrustwOrthy model-awaRE Analytics Data platfORm 688797
European Commission
RePhrase – REfactoring Parallel Heterogeneous Resource-Aware Applications - a Software Engineering Approach 644235
European Commission