Optimizing Workflow Performance by Elucidating Semantic Data Flow
Contributors
Project managers:
Description
The combination of ever-growing scientific datasets and the complexity of distributed workflows creates I/O performance bottlenecks due to data volume, velocity, and variety. While the increasing use of descriptive data formats (e.g., HDF5) helps organize these datasets, it also introduces obscure bottlenecks by requiring the translation of high-level operations into file addresses and subsequent low-level I/O operations. To address this challenge, we propose using Semantic Dataflow Graphs to analyze (a) relationships between logical datasets and file addresses, (b) how dataset operations translate into different I/O behaviors and their performance, and (c) the time-ordered relationship of tasks and data across entire workflows. Our analysis and visualization enable the identification of performance bottlenecks and reasoning about performance improvements in workflows.
Files
HUG_24_Semantic_Data_Flow_Poster__Horizontal_.pdf
Files
(2.1 MB)
Name | Size | Download all |
---|---|---|
md5:c6ab428667800086d90a3907620d0130
|
911.9 kB | Preview Download |
md5:96d7cfb29be244a63af4d1eb5eac437b
|
1.2 MB | Download |
Additional details
Related works
- Is supplemented by
- Video/Audio: https://youtu.be/3dY-V4O3Mf8 (URL)
Funding
- Orchestration for Distributed & Data-Intensive Scientific Exploration Office of Advanced Scientific Computing Research
- United States Department of Energy
- A High-Performance Storage Infrastructure for Activity and Log Workloads CSSI-2104013
- U.S. National Science Foundation