Published December 15, 2023 | Version v1
Other Restricted

High Performance GPU offloading using XBLang: an extensible language front-end for MLIR

Creators

Description

Abstract:

This paper presents the design and development of XBLang, a language front-end for MLIR, created to allow open access to Multi-Level Intermediate Representation (MLIR) infrastructure, focusing on GPU offloading.

The XBLang compiler contains an extensible high-level front-end design developed for creating and testing new language constructs and a middle-end based entirely on MLIR. XBLang targets multicore CPUs and GPU parallelism and successfully runs on targets like NVIDIA and AMD GPUs and CPUs like A64FX. Our results demonstrate speedups or comparable performance to vendor compilers for GPUs.

Highlighting one of our results from evaluation, we observe that NAS CG XBLang's GPU version on NVIDIA A100 is 74x faster than Clang-serial, 2.23x faster than Clang's OpenMP offload, and 1.17x faster than NVIDIA's OpenACC compilers; for AMD MI250x, XBLang is 36 x faster than Clang sequential, 5x faster than Clang's OpenMP offload, and 6x faster than AMD Clang's OpenMP offload.

Note:

  • Each JSON file contains the raw timing information collected and other metadata like the compiler version. Respectively, each PDF presents plots for said data.
  • XSBench.xb shows the computational kernel written in XBLang for the XSBench benchmark; we also include the output generated by the compiler for XSBench.xb after the code generation stage, concretization, and both lowerings.

Files

Restricted

The record is publicly accessible, but files are restricted to users with access.