GPU Performance Portability Using Standard C++ and SYCL

Delaney, Hugh

doi:10.5281/zenodo.7657431

Published February 20, 2023 | Version v1

Presentation Open

GPU Performance Portability Using Standard C++ and SYCL

Delaney, Hugh¹

1. Codeplay Software

The proliferation of accelerators, in particular GPUs, over the past decade is im-
pacting the way software is being developed. Most developers who have been using
CPU based machines are now considering how it’s possible to improve the per-
formance of applications by offloading execution to many core processors. Many
emerging disciplines including AI, deep neural networks and machine learning have
shown that GPUs can increase performance by many times compared to CPU-only
architectures. New hardware features such as ”tensor cores” are also starting to
emerge to address specific problems including mixed precision computing. The new
challenge for developers is figuring out how to develop for heterogeneous architec-
tures that include GPUs made by different companies. Currently the most common
way to develop software for GPUs is using the CUDA programming model but this
has pitfalls. CUDA uses non-standard C++ syntax and semantics, is a proprietary
interface, and can only be used to target Nvidia GPUs. Alternatives include HIP
which offers another proprietary programming interface only capable of targeting
AMD GPUs.
This presentation will demonstrate how standard C++ code with SYCL can be
used to achieve performance portability on processors from multiple vendors includ-
ing Nvidia GPUs, AMD GPUs and Intel GPUs. The SYCL programming interface
is a royalty free and industry defined open standard designed to enable the latest
features of accelerators. Using an open source project, we’ll show how standard
C++ syntax and semantics are used to define the SYCL kernel and memory man-
agement code required to offload parallel execution to a range of GPUs. Further to
this, we’ll explain how easy it is to compile this C++ code using a SYCL compiler
so that it can be run on Nvidia, AMD and Intel GPUs and compare this execu-
tion performance with the same code written using proprietary CUDA and HIP
environments.

Files

WAMTA23 Hugh.pdf

Files (2.2 MB)

Name	Size	Download all
WAMTA23 Hugh.pdf md5:2505aaf853c6c52c2afd33bb29548a72	2.2 MB	Preview Download

	All versions	This version
Views	33	33
Downloads	24	24
Data volume	58.1 MB	58.1 MB

GPU Performance Portability Using Standard C++ and SYCL

Creators

Description

Files

WAMTA23 Hugh.pdf

Files (2.2 MB)