Exploring Functional Acceleration of OpenCL on FPGAs and GPUs Through Platform-Independent Optimizations
Description
OpenCL has been proposed as a means of accelerating functional computation using FPGA and GPU accelerators. Although it provides ease of programmability and code portability, questions remain about the performance portability and underlying vendor's compiler capabilities to generate efficient implementations without user-dened, platform specic optimizations. In this work, we systematically evaluate this by formalizing a design space exploration strategy using platform-independent micro-architectural and application-specic optimizations only. The optimizations are then applied across Altera FPGA, NVIDIA GPU and ARM Mali GPU platforms for three computing examples, namely matrix-matrix multiplication, binomial-tree option pricing and 3-dimensional nite difference time domain. Our strategy enables a fair comparison across platforms in terms of throughput and energy efficiency by using the same design effort. Our results indicate that FPGA provides better performance portability in terms of achieved percentage of device's peak performance (68%) compared to NVIDIA GPU (20%) and also achieves better energy efficiency (up to 1:4X) for some of the considered cases without requiring in-depth hardware design expertise.
Files
Exploring Functional Acceleration of OpenCL_QUB.pdf
Files
(965.8 kB)
Name | Size | Download all |
---|---|---|
md5:e42bbd9fc2cd62ad1192d41169e910b1
|
965.8 kB | Preview Download |