Published June 22, 2017 | Version v1
Conference paper Open

GPU-accelerated implementation of the storage-efficient QR decomposition

  • 1. Max Planck Institute for Dynamics of Complex Technical Systems, Magdeburg, Germany

Description

The LAPACK routines \( \texttt{GEQRT2}\) and \(\texttt{GEQRT3}\) can be used to compute the QR decomposition of a matrix of size \(m \times n\) as well as the storage-efficient representation of the orthogonal factor \(Q=I-VTV^T\). A GPU-accelerated algorithm is presented that expands a blocked CPU-GPU hybrid QR decomposition to compute the triangular matrix \(T\). The storage-efficient representation is used in particular to access blocks of the matrix \(Q\) without having to generate all of it. The algorithm runs on one GPU and aims to use memory efficiently in order to process matrices as large as possible. Via the reuse of intermediate results the amount of necessary operations can be reduced significantly. As a result the algorithm outperforms the standard LAPACK routine by a factor of 3 for square matrices, which goes hand in hand with a reduced energy consumption.

Files

Penke_paco_2017.pdf

Files (261.3 kB)

Name Size Download all
md5:c5eaeddb25e800f214c1fa86dbf3d65a
261.3 kB Preview Download

Additional details

Related works

Is supplemented by
10.5281/zenodo.826567 (DOI)