Exploiting Locality in Sparse Matrix-Matrix Multiplication on the Many Integrated Core Architecture
Contributors
Other:
- 1. Bilkent University, Computer Engineering Department, 06800 Ankara, Turkey
Description
In this whitepaper, we propose outer-product-parallel and inner-product-parallel sparse matrix-matrix
multiplication (SpMM) algorithms for the Xeon Phi architecture. We discuss the trade-offs between these two
parallelization schemes for the Xeon Phi architecture. We also propose two hypergraph-partitioning-based
matrix partitioning and row/column reordering methods that achieve temporal locality in these two
parallelization schemes. Both HP models try to minimize the total number of transfers from/to the memory while
maintaining balance on computational loads of threads. The experimental results performed for realistic SpMM
instances show that the Intel MIC architecture has the potential for attaining high performance in irregular
applications, as well as regular applications. However, intelligent data and computation reordering that considers
better utilization of temporal locality should be developed for attaining high performance in irregular
applications.
Files
WP144.pdf
Files
(180.6 kB)
Name | Size | Download all |
---|---|---|
md5:a64dcbe50dbb6f6e3634b64ac6834dee
|
180.6 kB | Preview Download |