Analysis of SuperLU Solvers on Intel® MIC Architecture
Creators
- 1. Istanbul Technical University, National Center for High Performance Computing of Turkey (UHeM), Istanbul 34469, Turkey; Istanbul Technical University,Department of Mathematics, Istanbul 34469, Turkey
Contributors
- 1. Istanbul Technical University, National Center for High Performance Computing of Turkey (UHeM), Istanbul 34469, Turkey; Istanbul Technical University, Informatics Institute, Istanbul 34469, Turkey
Description
Intel Xeon Phi is a coprocessor with sixty-one cores in a single chip. The chip has a more powerful FPU that contains 512-bit
SIMD registers. Intel Xeon Phi chip can benefit from the algorithms that operate with the large vectors. In this work, sequential,
multithreaded and distributed versions of SuperLU solvers are tested on the Intel Xeon Phi using offload programming model
and they work well. There are several offload programming alternatives depending on where to place pragma directives. We find
that the sequential SuperLU benefited up to 45% performance improvement from the offload programming depending on the
sparse matrix type and the size of transferred and processed data. On the other hand, the partitioning method of SuperLU_DIST
and SuperLU_MT generates very small sized submatrices. Therefore, we observe that the matrix partitioning method and several
other tradeoffs influence their performance via the Xeon Phi architecture.
Files
WP135.pdf
Files
(190.9 kB)
Name | Size | Download all |
---|---|---|
md5:9461e3fead4e504faab6c69ca3f49a4a
|
190.9 kB | Preview Download |