Published August 30, 2018 | Version v1
Conference paper Open

GPU optimized math routines in the Stan math library

Description

This work impliments GPU optimizations for the Cholesky decomposition and its derivative in the Stan Math library (Carpenter et al. 2015). The Stan library’s No-U-Turn sampler (NUTS) typically explores the target distribution more efficiently than alternative samplers, though it is computationally more expensive per log probability evaluation. This research is motivated by large Gaussian Process (GP) models, where the log probability evaluation is very expensive and dominated by the inversion of the covariance matrix typically done within the Cholesky decomposition. Experimental results show that GPU optimizations are not optimal for small n × m matrices, however N = 5000 matrices can see speedups of 6x while retaining precision. This is the first known open source GPU implementation of the Cholesky decomposition for automatic differentation. Furthermore, the GPU kernels use OpenCL so the implimentation is not restricted to a particular GPU vendor.

Notes

Code and data available at github.com/stan-dev/stancon_talks

Files

report.pdf

Files (543.1 kB)

Name Size Download all
md5:605ea52e60d32b96d4a625b4f160bd95
543.1 kB Preview Download