Published July 28, 2023 | Version 1
Journal article Open

High Performance Kernels for FFT via Modern C++

Description

Mathematical software for the Fast Fourier Transform

We present a library for computing the Fast Fourier Transform (FFT) with an
interface that fully embraces the principles of modern C++.
We support half, single, and double precision; in-place and out-of-place
transforms; scaling; arbitrary dimensions; and both complex and real
time domain data.  We currently support AVX2 and AVX512 hardware.

The API of hpkfft is based on the abstract factory design pattern.
We have different types for in-place and out-of-place and template on both
precision and domain types to allow static type checking at compilation time.
Scaled and unscaled functions have distinct names, and the scale factor is
provided at each call site, not beforehand.
FFT compute objects are immutable, thread-safe, fully initialized at
construction, and managed by smart pointers.

Our results are overall more accurate than the leading vendor library.
For half precision, only 58 FFT lengths are vendor supported, and on
those, we have slightly lower error: a little better than 3% using the
geometric mean.
Our error for AVX512 single precision is 15% lower in the mean; for AVX512
double precision, 25% lower.

Furthermore, our performance is generally higher.
In AVX512 half precision, our geometric mean performance on the vendor
supported sizes is 15% higher.
That grows to 20% when evaluated using a small batch.
For single and double precision, our AVX512 performance is over 30% higher
in the mean.  For AVX2, it is over 40% higher.

Files

hpkfft-paper-2023.pdf

Files (774.4 kB)

Name Size Download all
md5:ddf0e6cb268c80afa40cd2e165c1328e
774.4 kB Preview Download