Performance measurements for "Bringing Order to Sparsity: A Sparse Matrix Reordering Study on Multicore CPUs"
Description
The paper "Bringing Order to Sparsity: A Sparse Matrix Reordering Study on Multicore CPUs" compares various strategies for reordering sparse matrices. The purpose of reordering is to improve performance of sparse matrix operations, for example, by reducing fill-in resulting from sparse Cholesky factorisation or improving data locality in sparse matrix-vector multiplication (SpMV). Many reordering strategies have been proposed in the literature and the current paper provides a thorough comparison of several of the most popular methods.
This comparison is based on performance measurements that were collected on the eX3 cluster, a Norwegian, experimental research infrastructure for exploration of exascale computing. These performance measurements are gathered in the data set provided here, particularly related to the performance of two SpMV kernels with respect to 490 sparse matrices, 6 matrix orderings and 8 multicore CPUs.
Experimental results are provided in a human-readable, tabular format using plain-text ASCII. This format may be readily consumed by gnuplot to create plots or imported into commonly used spreadsheet tools for further analysis.
Performance measurements are provided based on an SpMV kernel using the compressed sparse row (CSR) storage format with 7 matrix orderings. One file is provided for each of 8 multicore CPU systems considered in the paper:
1. Skylake: csr_all_xeongold16q_032_threads_ss490.txt
2. Ice Lake: csr_all_habanaq_072_threads_ss490.txt
3. Naples: csr_all_defq_064_threads_ss490.txt
4. Rome: csr_all_rome16q_016_threads_ss490.txt
5. Milan A: csr_all_fpgaq_048_threads_ss490.txt
6. Milan B: csr_all_milanq_128_threads_ss490.txt
7. TX2: csr_all_armq_064_threads_ss490.txt
8. Hi1620: csr_all_huaq_128_threads_ss490.txt
A corresponding set of files and performance measurements are provided for a second SpMV kernel that is also studied in the paper.
Each file consists of 490 rows and 54 columns. Each row corresponds to a different matrix from the SuiteSparse Matrix Collection (https://sparse.tamu.edu/). The first 5 columns specify some general information about the matrix, such as its group and name, as well as the number of rows, columns and nonzeros. Column 6 specifies the number of threads used for the experiment (which depends on the CPU). The remaining columns are grouped according to the 7 different matrix orderings that were studied, in the following order: original, Reverse Cuthill-McKee (RCM), Nested Dissection (ND), Approximate Minimum Degree (AMD), Graph Partitioning (GP), Hypergraph Partitioning (HP), and Gray ordering. For each ordering, the following 7 columns are given:
1. Minimum number of nonzeros processed by any thread by the SpMV kernel
2. Maximum number of nonzeros processed by any thread by the SpMV kernel
3. Mean number of nonzeros processed per thread by the SpMV kernel
4. Imbalance factor, which is the ratio of the maximum to the mean number of nonzeros processed per thread by the SpMV kernel
5. Time (in seconds) to perform a single SpMV iteration; this was measured by taking the minimum out of 100 SpMV iterations performed
6. Maximum performance (in Gflop/s) for a single SpMV iteration; this was measured by taking twice the number of matrix nonzeros and dividing by the minimum time out of 100 SpMV iterations performed.
7. Mean performance (in Gflop/s) for a single SpMV iteration; this was measured by taking twice the number of matrix nonzeros and dividing by the mean time of the 97 last SpMV iterations performed (i.e., the first 3 SpMV iterations are ignored).
The results in Fig. 1 of the paper show speedup (or slowdown) resulting from reordering with respect to 3 reorderings and 3 selected matrices. These results can be reproduced by inspecting the performance results that were collected on the Milan B and Ice Lake systems for the three matrices Freescale/Freescale2, SNAP/com-Amazon and GenBank/kmer_V1r. Specifically, the numbers displayed in the figure are obtained by dividing the maximum performance measured for the respective orderings (i.e., RCM, ND and GP) by the maximum performance measured for the original ordering.
The results presented in Figs. 2 and 3 of the paper show the speedup of SpMV as a result of reordering for the two SpMV kernels considered in the paper. In this case, gnuplot scripts are provided to reproduce the figures from the data files described above.
Files
matrix_reordering_survey_artifacts.zip
Files
(990.3 kB)
Name | Size | Download all |
---|---|---|
md5:23d43ee479fe1b393ca9f409704658bf
|
990.3 kB | Preview Download |