Towards Multi-Tier Stream Data Tiering in the Cloud-Edge Continuum
Authors/Creators
Description
Event streaming systems (e.g., Apache Kafka, Apache Pulsar) are a popular substrate for ingesting data with low latency from continuous data sources, such as sensors, cameras, or server logs. Due to the sheer amount of data being stored as data streams, several systems are incorporating storage tiering as a core feature. However, in some cases, the design of the streaming system assumes a reliable connection with the external storage to offload data. This may not be the case when deploying streaming pipelines in the Cloud-Edge Continuum.
In this paper, we evaluate deploying a streaming storage system with integrated data tiering (Pravega) in the Cloud-Edge Continuum. We identify that while Pravega provides good IO performance, extended unavailability of the long-term storage service may impact stream data ingestion. This can be problematic in Edge use cases with stringent streaming ingestion and processing requirements. To mitigate this problem, we explore the concept of multi-tier long-term storage in Pravega. We implement this concept by integrating an ephemeral tiered storage system (GEDS) to augment Pravega with advanced data tiering mechanisms. Our preliminary results show that GEDS can exploit multiple storage tiers that increase by 3.8x the tolerance of the streaming system to long-term storage unavailability.
Files
pravega_geds_cec.pdf
Files
(510.2 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:157bcc7553fe404c881601e570f63a12
|
510.2 kB | Preview Download |