Practical Storage-Compute Elasticity for Stream Data Processing

Gracia-Tinedo, Raúl; Junqueira, Flavio; Zhou, Brian; Xiong, Yimin; Liu, Luis

doi:10.1145/3626562.3626828

Published December 11, 2023 | Version v1

Conference paper Open

Practical Storage-Compute Elasticity for Stream Data Processing

1. Dell Technologies

Stream processing pipelines need to handle workload fluctuations (e.g., daily patterns, popularity spikes) by scaling up/down the resources contributed to running jobs. While there have been efforts proposing auto-scaling mechanisms for stream processing engines, prior work has overlooked the role of the storage system in ingesting and serving stream data. The absence of effective scaling for data streams is problematic given that the number of parallel partitions of a data stream limits both streaming data ingestion throughput and read parallelism for downstream streaming jobs. In this paper, we propose to augment the auto-scaling notion of stream processing engines with information about the source data stream. The key novelty of our approach lies in exploiting elastic data streams to ingest data, which is a unique feature of Pravega: a storage system for data streams part of the Dell's Streaming Data Platform. Pravega streams can dynamically change their parallelism based on the ingestion workload, and such information can in turn be exploited for auto-scaling the streaming job downstream. To this end, we have developed an Apache Flink connector for Pravega, as well as an auto-scaling orchestrator that feeds on data stream metrics. Our experiments show how a stream processing pipeline auto-scales by coordinating data stream and processing parallelism under workload fluctuations, with low operations cost.

Files

pravega-industry-final.pdf

Files (848.5 kB)

Name	Size	Download all
pravega-industry-final.pdf md5:f97691da8daef8bc02057ed53b279ed8	848.5 kB	Preview Download

Additional details

European Commission
NEARDATA - Extreme Near-Data Processing Platform 101092644
European Commission
CloudSkin - Adaptive virtualization for AI-enabled Cloud-edge Continuum 101092646

Repository URL: https://github.com/pravega/pravega-samples/tree/master/scenarios/pravega-flink-autoscaling

	All versions	This version
Views	98	98
Downloads	58	58
Data volume	57.7 MB	57.7 MB

pravega-industry-final.pdf

Files (848.5 kB)

Funding

Software

Practical Storage-Compute Elasticity for Stream Data Processing

Authors/Creators

Description

Files

pravega-industry-final.pdf

Files (848.5 kB)

Additional details

Funding

Software