GPU optimized math routines in the Stan math library
Description
This work impliments GPU optimizations for the Cholesky decomposition and its derivative in the Stan Math library (Carpenter et al. 2015). The Stan library’s No-U-Turn sampler (NUTS) typically explores the target distribution more efficiently than alternative samplers, though it is computationally more expensive per log probability evaluation. This research is motivated by large Gaussian Process (GP) models, where the log probability evaluation is very expensive and dominated by the inversion of the covariance matrix typically done within the Cholesky decomposition. Experimental results show that GPU optimizations are not optimal for small n × m matrices, however N = 5000 matrices can see speedups of 6x while retaining precision. This is the first known open source GPU implementation of the Cholesky decomposition for automatic differentation. Furthermore, the GPU kernels use OpenCL so the implimentation is not restricted to a particular GPU vendor.
Notes
Files
report.pdf
Files
(543.1 kB)
Name | Size | Download all |
---|---|---|
md5:605ea52e60d32b96d4a625b4f160bd95
|
543.1 kB | Preview Download |