Published February 19, 2019 | Version Version 1.0
Software Open

MP CBM-Z: Design a new architecture of CBMZ gas-phase chemical mechanism for the next generation processors

  • 1. Beijing Normal University
  • 2. Intel (China) Corporation
  • 3. Chinese Academy of Sciences

Description

Precise and rapid air quality simulations and forecasting are limited by the computational performance of the air quality model used, and the gas-phase chemistry module is the most time-consuming function in the air quality model.  In this study, we designed a new framework for the widely used the Carbon Bond Mechanism Z (CBM-Z) gas-phase chemical kinetics kernel to adapt the single-instruction, multiple-data (SIMD) technology in next-generation processors to improve its calculation performance. The optimization implements the fine-grain level parallelization of CBMZ by improving its vectorization ability. Through constructing loops and integrating the main branches, e.g., diverse chemistry sub-schemes, multiple spatial points in the model can be operated simultaneously on vector processing units (VPUs). Two generation CPUs – Intel Xeon E5-2680 V4 CPU and Intel Xeon Gold 6132 and Intel Xeon Phi 7250 Knights Landing (KNL) are used as the benchmark processors. The validation of the CBM-Z module outputs indicates that the relative bias reaches a maximum of 0.025% after 10 h integration with -fp-model fast=1 compile flag. The results of the module test show that the Multiple-Points CBM-Z (MP CBM-Z) resulted in 5.16x and 8.97x speedup on a single core of Intel Xeon E5-2680 V4 and Intel Xeon Gold 6132 CPUs, respectively, and KNL had a speedup of 3.69x compared with the performance of CBM-Z on the Intel Xeon E5-2680 V4 platform. For the single-node tests, the speedup on the two generation CPUs can reach 104.63x and 198.50x using message passing interface (MPI) and 101.02x and 194.60x using OpenMP, and the speedup on the KNL node can reach 175.23x using MPI and 167.45x using OpenMP. The speedup of the optimized CBM-Z is approximately 40% higher on a one-socket KNL platform than on a two-socket Broadwell platform and about 13 %–16% lower than on a two-socket Skylake platform. We also tested a three-dimensional chemistry transport model (CTM) named Nested Air Quality Prediction Model System (NAQPMS) equipped with the MP CBM-Z. The tests illustrate an obvious improvement on the performance for the CTM after adopting the MP CBM-Z. The results show that the MP CBM-Z leads to a speedup of 3.32 and 1.96 for the gas-phase chemistry module and the CTM on the Intel Xeon E5-2680 platform. Moreover, on the new Intel Xeon Gold 6132 platform, the MP CBM-Z gains 4.90x and 2.22x speedups for the gas-phase chemistry module and the whole CTM. For the KNL, the MP CBM-Z enables a 3.52x speedup for the gas-phase chemistry module, but the whole model lost 24.10% performance compared to the CPU platform due to the poor performance of other modules. In addition, since this optimization seeks to improve the utilization of the VPU, the model is more suitable for the new generation processors adopting the more advanced SIMD technology. The results of our tests already show that the benefit of updating CPU improved by about 47% by using the MP CBM-Z since the optimized code has better adaptability for the new hardware. This work improves the performance of the CBM-Z chemical kinetics kernel as well as the calculation efficiency of the air quality model, which can directly improve the practical value of the air quality model in scientific simulations and routine forecasting.Precise and rapid air quality simulations and forecasting are limited by the computational performance of the air quality model used, and the gas-phase chemistry module is the most time-consuming function in the air quality model.  In this study, we designed a new framework for the widely used the Carbon Bond Mechanism Z (CBM-Z) gas-phase chemical kinetics kernel to adapt the single-instruction, multiple-data (SIMD) technology in next-generation processors to improve its calculation performance. The optimization implements the fine-grain level parallelization of CBMZ by improving its vectorization ability. Through constructing loops and integrating the main branches, e.g., diverse chemistry sub-schemes, multiple spatial points in the model can be operated simultaneously on vector processing units (VPUs). Two generation CPUs – Intel Xeon E5-2680 V4 CPU and Intel Xeon Gold 6132 and Intel Xeon Phi 7250 Knights Landing (KNL) are used as the benchmark processors. The validation of the CBM-Z module outputs indicates that the relative bias reaches a maximum of 0.025% after 10 h integration with -fp-model fast=1 compile flag. The results of the module test show that the Multiple-Points CBM-Z (MP CBM-Z) resulted in 5.16x and 8.97x speedup on a single core of Intel Xeon E5-2680 V4 and Intel Xeon Gold 6132 CPUs, respectively, and KNL had a speedup of 3.69x compared with the performance of CBM-Z on the Intel Xeon E5-2680 V4 platform. For the single-node tests, the speedup on the two generation CPUs can reach 104.63x and 198.50x using message passing interface (MPI) and 101.02x and 194.60x using OpenMP, and the speedup on the KNL node can reach 175.23x using MPI and 167.45x using OpenMP. The speedup of the optimized CBM-Z is approximately 40% higher on a one-socket KNL platform than on a two-socket Broadwell platform and about 13 %–16% lower than on a two-socket Skylake platform. We also tested a three-dimensional chemistry transport model (CTM) named Nested Air Quality Prediction Model System (NAQPMS) equipped with the MP CBM-Z. The tests illustrate an obvious improvement on the performance for the CTM after adopting the MP CBM-Z. The results show that the MP CBM-Z leads to a speedup of 3.32 and 1.96 for the gas-phase chemistry module and the CTM on the Intel Xeon E5-2680 platform. Moreover, on the new Intel Xeon Gold 6132 platform, the MP CBM-Z gains 4.90x and 2.22x speedups for the gas-phase chemistry module and the whole CTM. For the KNL, the MP CBM-Z enables a 3.52x speedup for the gas-phase chemistry module, but the whole model lost 24.10% performance compared to the CPU platform due to the poor performance of other modules. In addition, since this optimization seeks to improve the utilization of the VPU, the model is more suitable for the new generation processors adopting the more advanced SIMD technology. The results of our tests already show that the benefit of updating CPU improved by about 47% by using the MP CBM-Z since the optimized code has better adaptability for the new hardware. This work improves the performance of the CBM-Z chemical kinetics kernel as well as the calculation efficiency of the air quality model, which can directly improve the practical value of the air quality model in scientific simulations and routine forecasting.

Notes

We provide the box model code of MP CBM-Z. Please download the new version tar bag for testing and reusing code.

Files

Files (189.8 MB)

Name Size Download all
md5:640970a8cfb5ea07d54deac2963574ba
189.8 MB Download