GPU-accelerated implementation of the storage-efficient QR decomposition

doi:10.5281/zenodo.815816

Published June 22, 2017 | Version v1

Conference paper Open

GPU-accelerated implementation of the storage-efficient QR decomposition

1. Max Planck Institute for Dynamics of Complex Technical Systems, Magdeburg, Germany

The LAPACK routines \( \texttt{GEQRT2}\) and \(\texttt{GEQRT3}\) can be used to compute the QR decomposition of a matrix of size \(m \times n\) as well as the storage-efficient representation of the orthogonal factor \(Q=I-VTV^T\). A GPU-accelerated algorithm is presented that expands a blocked CPU-GPU hybrid QR decomposition to compute the triangular matrix \(T\). The storage-efficient representation is used in particular to access blocks of the matrix \(Q\) without having to generate all of it. The algorithm runs on one GPU and aims to use memory efficiently in order to process matrices as large as possible. Via the reuse of intermediate results the amount of necessary operations can be reduced significantly. As a result the algorithm outperforms the standard LAPACK routine by a factor of 3 for square matrices, which goes hand in hand with a reduced energy consumption.

Files

Penke_paco_2017.pdf

Files (261.3 kB)

Name	Size	Download all
Penke_paco_2017.pdf md5:c5eaeddb25e800f214c1fa86dbf3d65a	261.3 kB	Preview Download

Additional details

Is supplemented by: 10.5281/zenodo.826567 (DOI)

	All versions	This version
Views	84	84
Downloads	40	40
Data volume	10.7 MB	10.7 MB

GPU-accelerated implementation of the storage-efficient QR decomposition

Creators

Description

Files

Penke_paco_2017.pdf

Files (261.3 kB)

Additional details

Related works