00000nmm##2200000uu#4500 4905213 doi 10.5281/zenodo.4905213 oai:zenodo.org:4905213 Nguyen, Vinh Khuong RMIT Vietnam Hardware Benchmark for Deep Learning Capability Vo, Huynh Quang Nguyen Aalto University info:eu-repo/semantics/openAccess Creative Commons Attribution 4.0 International https://creativecommons.org/licenses/by/4.0/legalcode cc-by-4.0 spdx deep learning hardware benchmarking 1. Introduction These files contain the proposed implementation for benchmarking to evaluate whether a setup of hardware is feasible for complex deep learning projects. 2. Scope  <ul> <li>The benchmark evaluates the performance of a setup having a single CPU, a single GPU, RAM and memory storage. The performance of multi-CPUs/multi-GPUs or server-based is included in our scope.</li> <li>The benchmark is built on the Anaconda distribution of Python, and the Jupyter Notebook computational environment. The deep learning models mentioned in this benchmarked are implemented using the Keras application programming interface (API).</li> <li>Our goal is to develop a verified approach to conduct the hardware benchmark that is quick and easy to use. To do so, we provide benchmarking programs as well as the installation guide for Anaconda and deep learning-supported packages.</li> </ul> 3. Evaluation metrics  There are various metrics to benchmark the performance capabilities of a setup for deep learning purposes. Here, the following metrics are used: <ol> <li>Total execution time: the total execution time includes both the total training time and the total validation time of a deep learning model on a dataset after a defined number of epochs. Here, the number of epochs is 100. The lower the total execution time the better.</li> <li>Total inference time: the total inference time includes both the model loading time (the time required to fully load a set of pre-trained weights to implement a model) and the total prediction time of a deep learning model on a test dataset. Similar to the total execution time, the lower the total inference time the better.</li> <li>FLOPS: the performance capability of a CPU or GPU can be measured by counting the number of floating operation points (FLO) it can execute per second. Thus, the higher the FLOPS, the better.</li> <li>Computing resources issues/errors: Ideally, a better-performed setup will not encounter any computing resources issues/errors including but not limited to the Out-Of-Memory (OOM) error.</li> <li>Bottlenecking: to put it simply, bottlenecking is a subpar performance that is caused by the inability of one component to keep up with the others, thus slowing down the overall ability of a setup to process data. Here, our primary concern is the bottlenecking between CPU and GPU. The bottlenecking factor is measured using an online tool: <a href="https://pc-builds.com/calculator/">Bottleneck Calculator</a></li> </ol>  4. Methods <ul> <li>To evaluate the hardware performance, two deep learning models are deployed for benchmarking purpose. The first model is a modified VGG19 based on a study by Deitsch et al. (Model A) [1], and the other model is a modified concatenated model proposed in a study from Rahimzadeh et al. (Model B) [2]. These models were previously implemented in Vo et al [3]. The model compilation, training and validation practices are similar to those mentioned in Vo et al [3]. Besides, several optimization practices such as mixed precision policy are applied for model training to make it run faster and consume less memory. The following datasets are used for benchmarking: the original MNIST dataset by LeCun et al., and the Zalando MNIST dataset by Xiao et al.</li> <li>On the other hand, we also proposed another approach for benchmarking that is much simpler and quicker: evaluating the total execution time for a combination of basic operations. These basic operations include General Matrix to Matrix Multiplication (GEMM), 2D-Convolution (Convolve2D) and Recurrent Neural Network (RNN), and exist in almost all deep neural networks today [4]. We implemented our alternative approach based on the DeepBench work by Baidu [5]: <ul> <li>In DMM, we defined matrix C as a product of (MxN) and (NxK) matrices. For example, (3072,128,1024) means the resulting matrix is a product of (3072x128) and (128x1024) matrices. To benchmark, we implemented five different multiplications and measured the overall total execution time of these five. These multiplications included (3072,128,1024), (5124,9124,2560), (2560,64,2560), (7860,64,2560), and (1760,128,1760).</li> <li>In SMM, we defined matrix C as a product of (MxN) and (NxK) matrices, and (100 - Dx100)% of the (MxN) matrix is omitted. For instance, (10752,1,3584,0.9) means the resulting matrix is a product of (10752x1) and (1x3584) matrices, while 10% of the (10752x1) matrix is omitted. To benchmark, we implemented four different multiplications and measured the overall total execution time of these five. These multiplications included (10752,1,3584,0.9), (7680,1500,2560,0.95), (7680,2,2560,0.95), and (7680,1,2560,0.95).</li> <li>In Convolve2D, we defined a simple model containing only convolution layers and pooling layers and measured the resulting total execution time. The dataset used for this training this model is the Zalando MNIST by Xiao et al.</li> <li>We did not implement the RNN due to several issues caused by the new version of Keras.</li> </ul> </li> <li>To evaluate total inference time, we loaded the already trained weights from our models (denoted as Model A-benchmarked and Model B-benchmarked, respectively) which has the best validation accuracy and conducted a prediction run on the test set from the Zalando MNIST. These files are available on Zenodo: <a href="https://zenodo.org/record/4905213#.YL1-P_kzaUk">Inference Models</a></li> </ul> 5.  References <ul> <li>[1] S. Deitsch, V. Christlein, S. Berger, C. Buerhop-Lutz, A. Maier, F. Gallwitz, and C. Riess, “Automatic classification of defective photovoltaic module cells in electroluminescence images,” Solar Energy, vol. 185, p. 455–468, 06-2019</li> <li>[2] M. Rahimzadeh and A. Attar, “A modified deep convolutional neural network for detecting COVID-19 and pneumonia from chest X-ray images based on the concatenation of Xception and ResNet50V2,” Informatics in MedicineUnlocked, vol. 19, p. 100360, 2020.</li> <li>[3] H. Vo, “Realization and Verification of Deep Learning Models for FaultDetection and Diagnosis of Photovoltaic Modules,” Master’s Thesis, Aalto University. School of Electrical Engineering, 2021.</li> <li>[4] P. Warden, "Why GEMM is at the heart of deep learning," Pete Warden's Blog, 2015. Available at: <a href="https://petewarden.com/2015/04/20/why-gemm-is-at-the-heart-of-deep-learning/">https://petewarden.com/2015/04/20/why-gemm-is-at-the-heart-of-deep-learning/</a></li> <li>[5] Baidu Research, "Benchmarking Deep Learning operations on different hardware". Available at: <a href="https://github.com/baidu-research/DeepBench">https://github.com/baidu-research/DeepBench</a></li> <li>[6] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, 1998.</li> <li>[7] Xiao, K. Rasul, and R. Vollgraf, “A Novel Image Dataset for Benchmarking Machine Learning Algorithms,” 2017. <a href="https://github.com/zalandoresearch/fashion-mnist">https://github.com/zalandoresearch/fashion-mnist</a></li> <li>[8] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion,O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vander-plas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay,“Scikit-learn: Machine learning in Python,” Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011.</li> <li>[9] F. Chollet, “Keras,” 2015. Available at: <a href="https://github.com/fchollet/keras">https://github.com/fchollet/keras</a></li> <li>[10] ML Commons. Available at: <a href="https://mlcommons.org/en/">https://mlcommons.org/en/</a></li> <li>[11] W. Dai and D. Berleant, “Benchmarking contemporary deep learning hardware and frameworks: A survey of qualitative metrics,” 2019 IEEE First International Conference on Cognitive Machine Intelligence (CogMI), Dec 2019.</li> </ul>   <ul> </ul> S. Deitsch, V. Christlein, S. Berger, C. Buerhop-Lutz, A. Maier, F. Gallwitz, and C. Riess, "Automatic classification of defective photovoltaic module cells in electroluminescence images," Solar Energy, vol. 185, p. 455–468, 06-2019 M. Rahimzadeh and A. Attar, "A modified deep convolutional neural network for detecting COVID-19 and pneumonia from chest X-ray images based on the concatenation of Xception and ResNet50V2," Informatics in MedicineUnlocked, vol. 19, p. 100360, 2020. H. Vo, "Realization and Verification of Deep Learning Models for FaultDetection and Diagnosis of Photovoltaic Modules," Master's Thesis, Aalto University. School of Electrical Engineering, 2021. P. Warden, "Why GEMM is at the heart of deep learning," Pete Warden's Blog, 2015. Available at: https://petewarden.com/2015/04/20/why-gemm-is-at-the-heart-of-deep-learning/ Baidu Research, "Benchmarking Deep Learning operations on different hardware". Available at: https://github.com/baidu-research/DeepBench Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, 1998. Xiao, K. Rasul, and R. Vollgraf, "A Novel Image Dataset for Benchmarking Machine Learning Algorithms," 2017. https://github.com/zalandoresearch/fashion-mnist F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion,O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vander-plas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay,"Scikit-learn: Machine learning in Python," Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011. F. Chollet, "Keras," 2015. Available at: https://github.com/fchollet/keras ML Commons. Availabke at: https://mlcommons.org/en/ W. Dai and D. Berleant, "Benchmarking contemporary deep learn-ing hardware and frameworks: A survey of qualitative metrics," 2019 IEEE First International Conference on Cognitive Machine Intelligence (CogMI), Dec 2019. Zenodo 2021-06-07 info:eu-repo/semantics/other 20210608014904.0 16441 md5:d3469cc53a135d2b566892c44773aa1e https://zenodo.org/records/4905213/files/Benchmark Type A.ipynb 1417895536 md5:b0c20471fa9b14c800cf66e311b8e031 https://zenodo.org/records/4905213/files/Model B-benchmarked.hdf5 202813960 md5:70c86da2e2050697981862f803ce79f6 https://zenodo.org/records/4905213/files/Model A-benchmarked.hdf5 4597 md5:6fd4a8503fc2cb528954b31920323ef9 https://zenodo.org/records/4905213/files/Benchmark Inference.ipynb 12030 md5:13012a2718bfcd3343955ebaa9d5004a https://zenodo.org/records/4905213/files/Installation Guide.ipynb 11393 md5:e91d2124420ad7e9fb190c605b0bdad8 https://zenodo.org/records/4905213/files/Benchmark Simple.ipynb 15539 md5:c1ae3106f4e1f1b3c1bf9a301aa66c9f https://zenodo.org/records/4905213/files/Benchmark Type B.ipynb open 10.5281/zenodo.4905212 isVersionOf doi