GPU-accelerated implementation of the storage-efficient QR decomposition
Creators
- 1. Max Planck Institute for Dynamics of Complex Technical Systems, Magdeburg, Germany
Description
The LAPACK routines \( \texttt{GEQRT2}\) and \(\texttt{GEQRT3}\) can be used to compute the QR decomposition of a matrix of size \(m \times n\) as well as the storage-efficient representation of the orthogonal factor \(Q=I-VTV^T\). A GPU-accelerated algorithm is presented that expands a blocked CPU-GPU hybrid QR decomposition to compute the triangular matrix \(T\). The storage-efficient representation is used in particular to access blocks of the matrix \(Q\) without having to generate all of it. The algorithm runs on one GPU and aims to use memory efficiently in order to process matrices as large as possible. Via the reuse of intermediate results the amount of necessary operations can be reduced significantly. As a result the algorithm outperforms the standard LAPACK routine by a factor of 3 for square matrices, which goes hand in hand with a reduced energy consumption.
Files
Penke_paco_2017.pdf
Files
(261.3 kB)
Name | Size | Download all |
---|---|---|
md5:c5eaeddb25e800f214c1fa86dbf3d65a
|
261.3 kB | Preview Download |
Additional details
Related works
- Is supplemented by
- 10.5281/zenodo.826567 (DOI)