Published February 28, 2013 | Version v1
Working paper Open

Auto-tuning of the FFTW Library for Massively Parallel Supercomputers

  • 1. CINECA, Italy

Contributors

Description

In this paper we present the work carried out by CINECA in the framework of the PRACE-2IP project which had the aim of improving the performance of the FFTW library by refining the auto-tuning mechanism that is already implemented in this library. This optimization was realized with the following activities:
Identification of the major bottlenecks present in the current FFTW implementation;
Investigation of the auto-tuning mechanism provided in FFTW in order to understand how performance is affected by domain decomposition; Introduction of a new parallel domain decomposition; Construction of a library to improve the performance of the auto-tuning mechanism. In particular, we have compared the performance of the standard Slab Decomposition algorithm already present with that obtained using the 2D Domain Decomposition and we found that on massively parallel supercomputers the performance of this new algorithm is significantly higher.

Files

Auto-tuning_of_FFTW_library_for_massively_Parallel_Supercomputers-CINECA-PRACE2IP-WP12.1.pdf

Additional details

Funding

PRACE-2IP – PRACE - Second Implementation Phase Project 283493
European Commission