TorchSparse++: Efficient Training and Inference Framework for Sparse Convolution on GPUs

doi:10.5281/zenodo.8311889

Published September 2, 2023 | Version v1

Conference paper Open

TorchSparse++: Efficient Training and Inference Framework for Sparse Convolution on GPUs

1. MIT
2. MIT, Tsinghua University
3. Tsinghua University
4. UCSD
5. UC Berkeley
6. Shanghai Jiao Tong University

Sparse convolution computation is important for AR/VR and ADAS. It involves sparse and irregular computation patterns, requiring specialized high-performance kernels. Existing GPU libraries offer two dataflow types for this workload. The gather-GEMM-scatter dataflow is easy to implement but not optimal in performance, while the dataflows with overlapped computation and memory access (e.g. implicit GEMM) are highly performant but have very high engineering costs. In this work we introduce TorchSparse++, a new GPU library that achieves the best of both worlds. We create a highly efficient Sparse Kernel Generator that generates performant sparse point cloud convolution kernels at less than one-tenth of the engineering cost of the current state-of-the-art system. On top of this, we design the Sparse Autotuner, which extends the design space of existing point cloud libraries and searches for the best dataflow configurations for training and inference workloads. Consequently, TorchSparse++ achieves 2.9x, 3.3x, 2.2x and 1.7x measured end-to-end speedup on an NVIDIA A100 GPU over state-of-the-art MinkowskiEngine, SpConv 1.2, TorchSparse and SpConv v2 in inference; and is 1.2-1.3x faster than SpConv v2 in mixed precision training.

Files

torchsparse++-artifact-micro.zip

Files (22.2 MB)

Name	Size	Download all
torchsparse++-artifact-micro.zip md5:19e5b0e5287f8f8ab67acf62dd2aee91	22.2 MB	Preview Download

	All versions	This version
Views	405	405
Downloads	79	79
Data volume	2.0 GB	2.0 GB

TorchSparse++: Efficient Training and Inference Framework for Sparse Convolution on GPUs

Creators

Description

Files

torchsparse++-artifact-micro.zip

Files (22.2 MB)