Memory-Aware Latency Prediction Model for Concurrent Kernels in Partitionable GPUs: Simulations and Experiments

doi:10.5281/zenodo.8297873

Published May 15, 2023 | Version v1

Conference paper Open

Memory-Aware Latency Prediction Model for Concurrent Kernels in Partitionable GPUs: Simulations and Experiments

1. Unimore

The current trend in recently released Graphic Processing Units (GPUs) is to exploit
transistor scaling at the architectural level, hence, larger and larger GPUs in every new chip
generation are released. Architecturally, this implies that the clusters count of parallel processing
elements embedded within a single GPU die is constantly increasing, posing novel and interesting
research challenges for performance engineering in latency-sensitive scenarios. A single GPU
kernel is now likely not to scale linearly when dispatched in a GPU that features a larger cluster
count. This is either due to VRAM bandwidth acting as a bottleneck or due to the inability of the
kernel to saturate the massively parallel compute power available in these novel architectures.
In this context, novel scheduling approaches might be derived if we consider the GPU as a
partitionable compute engine in which multiple concurrent kernels can be scheduled in non-
overlapping sets of clusters. While such an approach is very effective in improving the GPU
overall utilization, it poses significant challenges in estimating kernel execution time latencies
when kernels are dispatched to variable-sized GPU partitions. Moreover, memory interference
within co-running kernels is a mandatory aspect to consider. In this work, we derive a practical
yet fairly accurate memory-aware latency estimation model for co-running GPU kernels.

Files

JSSPP23.pdf

Files (1.9 MB)

Name	Size	Download all
JSSPP23.pdf md5:d23a4c0e75aab5b3f646503ed4cad5f6	1.9 MB	Preview Download

Additional details

IMOCO4.E – Intelligent Motion Control under Industry 4.E 101007311: European Commission

	All versions	This version
Views	78	78
Downloads	79	79
Data volume	157.6 MB	157.6 MB

Memory-Aware Latency Prediction Model for Concurrent Kernels in Partitionable GPUs: Simulations and Experiments

Creators

Description

Files

JSSPP23.pdf

Files (1.9 MB)

Additional details

Funding