Published August 12, 2024 | Version v2
Software Open

High Performance Unstructured SpMM Computation Using Tensor Cores

  • 1. ROR icon ETH Zurich
  • 2. ROR icon Free University of Bozen-Bolzano
  • 3. ROR icon University of Trento

Description

High-performance sparse matrix–matrix (SpMM) multiplication is paramount for science and industry, as the ever-increasing sizes of data prohibit using dense data structures. Yet, existing hardware, such as Tensor Cores (TC), is ill-suited for SpMM, as it imposes strict constraints on data structures that cannot be met by  unstructured sparsity found in many applications. To address this, we introduce (S)parse (Ma)trix Matrix (T)ensor Core-accelerated (SMaT): a novel SpMM library that utilizes TCs for unstructured sparse matrices. Our block-sparse library leverages the low-level CUDA MMA (matrix-matrix-accumulate) API, maximizing the performance offered by modern GPUs. Algorithmic optimizations such as sparse matrix permutation, further improve performance by minimizing the number of non-zero blocks. The evaluation on NVIDIA A100 shows that SMaT outperforms SotA libraries (DASP, cuSPARSE, and Magicube) by up to 125x (on average 2.6x). SMaT can be used to accelerate many workloads in scientific computing, large model training, inference, and others.

Files

smat.zip

Files (168.5 MB)

Name Size Download all
md5:1933a3799560592b0eefe0ca606d50a6
168.5 MB Preview Download

Additional details

Additional titles

Alternative title
SMaT - (S)parse (Ma)trix Matrix (T)ensor Core-accelerated

Software

Repository URL
https://github.com/PatrikOkanovic/smat
Programming language
C++ , Cuda , Python