Conference paper Open Access

A Comparison of Big Data Frameworks on a Layered Dataflow Model

Claudia Misale; Maurizio Drocco; Marco Aldinucci; Guy Tremblay

In the world of Big Data analytics, there is a series of tools aiming at simplifying programming applications to be executed on clusters. Although each tool claims to provide better programming, data and execution models, for which only informal (and often confusing) semantics is generally provided, all share a common underlying model, namely, the Dataflow model. The Dataflow model we propose shows how various tools share the same expressiveness at different levels of abstraction. The contribution of this work is twofold: first, we show that the proposed model is (at least) as general as existing batch and streaming frameworks (e.g., Spark, Flink, Storm), thus making it easier to understand high-level data-processing applications written in such frameworks. Second, we provide a layered model that can represent tools and applications following the Dataflow paradigm and we show how the analyzed tools fit in each level.

https://arxiv.org/abs/1606.05293
Files (571.6 kB)
Name Size
1606.05293v1.pdf
md5:5e296d1cbd903633ff94917f0536ab89
571.6 kB Download
3
0
views
downloads
All versions This version
Views 33
Downloads 00
Data volume 0 Bytes0 Bytes
Unique views 33
Unique downloads 00

Share

Cite as