Journal article Open Access

DSPBench: A Suite of Benchmark Applications for Distributed Data Stream Processing Systems

Maycon Viana Bordin; Dalvan Griebler; Gabriele Mencagli; Claudio F. R. Geyer; Luiz G. L. Fernandes

Systems enabling the continuous processing of large data streams have recently attracted the attention of the scientific community and industrial stakeholders. Data Stream Processing Systems (DSPSs) are complex and powerful frameworks able to ease the development of streaming applications in distributed computing environments like clusters and clouds. Several systems of this kind have been released and currently maintained as open source projects, like Apache Storm and Spark Streaming. Some benchmark applications have often been used by the scientific community to test and evaluate new techniques to improve the performance and usability of DSPSs. However, the existing benchmark suites lack of representative workloads coming from the wide set of application domains that can leverage the benefits offered by the stream processing paradigm in terms of near real-time performance. The goal of this article is to present a new benchmark suite composed of 15 applications coming from areas like Finance, Telecommunications, Sensor Networks, Social Networks and others. This article describes in detail the nature of these applications, their full workload characterization in terms of selectivity, processing cost, input size and overall memory occupation. In addition, it exemplifies the usefulness of our benchmark suite to compare real DSPSs by selecting Apache Storm and Spark Streaming for this analysis.

Files (1.5 MB)
Name Size
Paper DSPBench.pdf
md5:f3ad9cad62cf80a6e3261baedbffd951
1.5 MB Download
19
14
views
downloads
Views 19
Downloads 14
Data volume 21.3 MB
Unique views 16
Unique downloads 13

Share

Cite as