{
  "DOI": "10.5281/zenodo.4905213",
  "abstract": "1. Introduction\n\n\nThese files contain the proposed implementation for benchmarking to evaluate whether a setup of hardware is feasible for complex deep learning projects.\n\n\n2. Scope\u00a0\n\n\n\n\t\nThe benchmark evaluates the performance of a setup having a single CPU, a single GPU, RAM and memory storage. The performance of multi-CPUs/multi-GPUs or server-based is included in our scope.\n\t\nThe benchmark is built on the\u00a0Anaconda\u00a0distribution of Python, and the\u00a0Jupyter Notebook\u00a0computational environment. The deep learning models mentioned in this benchmarked are implemented using the\u00a0Keras\u00a0application programming interface (API).\n\t\nOur goal is to develop a verified approach to conduct the hardware benchmark that is quick and easy to use.\u00a0To do so, we provide benchmarking programs as well as the installation guide for Anaconda and deep learning-supported packages.\n\n\n\n3. Evaluation metrics\n\n\n\u00a0There are various metrics to benchmark the performance capabilities of a setup for deep learning purposes. Here, the following metrics are used:\n\n\n\n\t\nTotal execution time: the\u00a0total execution time\u00a0includes both the\u00a0total training time\u00a0and the\u00a0total validation time\u00a0of a deep learning model on a dataset after a defined number of epochs. Here, the number of epochs is 100. The lower the\u00a0total execution time\u00a0the better.\n\t\nTotal inference time: the\u00a0total inference time\u00a0includes both the\u00a0model loading time\u00a0(the time required to fully load a set of pre-trained weights to implement a model) and the\u00a0total prediction time\u00a0of a deep learning model on a test dataset. Similar to the\u00a0total execution time, the lower the\u00a0total inference time\u00a0the better.\n\t\nFLOPS: the performance capability of a CPU or GPU can be measured by counting the number of floating operation points (FLO) it can execute per second. Thus, the higher the\u00a0FLOPS, the better.\n\t\nComputing resources issues/errors: Ideally, a better-performed setup will not encounter any computing resources issues/errors including but not limited to the Out-Of-Memory (OOM) error.\n\t\nBottlenecking: to put it simply, bottlenecking is a subpar performance that is caused by the inability of one component to keep up with the others, thus slowing down the overall ability of a setup to process data. Here, our primary concern is the bottlenecking between CPU and GPU. The\u00a0bottlenecking factor\u00a0is measured using an online tool:\u00a0Bottleneck Calculator\n\n\n\n\u00a04. Methods\n\n\n\n\t\nTo evaluate the hardware performance, two deep learning models are deployed for benchmarking purpose. The first model is a modified VGG19 based on a study by Deitsch et al. (Model A) [1], and the other model is a modified concatenated model proposed in a study from Rahimzadeh et al. (Model B) [2]. These models were previously implemented in Vo et al [3]. The model compilation, training and validation practices are similar to those mentioned in Vo et al [3]. Besides, several optimization practices such as mixed precision policy are applied for model training to make it run faster and consume less memory. The following datasets are used\u00a0for benchmarking: the\u00a0original MNIST dataset\u00a0by LeCun et al., and the\u00a0Zalando MNIST dataset\u00a0by Xiao et al.\n\t\nOn the other hand, we also proposed another approach for benchmarking that is much simpler and quicker: evaluating the\u00a0total execution time\u00a0for a combination of basic operations. These basic operations include General Matrix to Matrix Multiplication (GEMM), 2D-Convolution (Convolve2D) and Recurrent Neural Network (RNN), and exist in almost all deep neural networks today [4]. We implemented our alternative approach based on the DeepBench work by Baidu [5]:\n\t\n\n\t\t\nIn DMM, we defined matrix C as a product of\u00a0(MxN)\u00a0and\u00a0(NxK)\u00a0matrices. For example,\u00a0(3072,128,1024)\u00a0means the resulting matrix is a product of\u00a0(3072x128)\u00a0and\u00a0(128x1024)\u00a0matrices. To benchmark, we implemented five different multiplications and measured the overall\u00a0total execution time\u00a0of these five. These multiplications included\u00a0(3072,128,1024),\u00a0(5124,9124,2560),\u00a0(2560,64,2560),\u00a0(7860,64,2560), and\u00a0(1760,128,1760).\n\t\t\nIn SMM, we defined matrix C as a product of\u00a0(MxN)\u00a0and\u00a0(NxK)\u00a0matrices, and\u00a0(100 - Dx100)%\u00a0of the\u00a0(MxN)\u00a0matrix is omitted. For instance,\u00a0(10752,1,3584,0.9)\u00a0means the resulting matrix is a product of\u00a0(10752x1)\u00a0and\u00a0(1x3584)\u00a0matrices, while 10% of the\u00a0(10752x1)\u00a0matrix is omitted. To benchmark, we implemented four different multiplications and measured the overall\u00a0total execution time\u00a0of these five. These multiplications included\u00a0(10752,1,3584,0.9),\u00a0(7680,1500,2560,0.95),\u00a0(7680,2,2560,0.95), and\u00a0(7680,1,2560,0.95).\n\t\t\nIn Convolve2D, we defined a simple model containing only convolution layers and pooling layers\u00a0and measured the resulting\u00a0total execution time. The dataset used for this training this model is the\u00a0Zalando MNIST\u00a0by Xiao et al.\n\t\t\nWe did not implement the\u00a0RNN\u00a0due to several issues caused by the new version of Keras.\n\t\n\t\n\t\nTo evaluate\u00a0total inference time, we loaded the already trained weights from our models (denoted as\u00a0Model A-benchmarked\u00a0and\u00a0Model B-benchmarked, respectively) which has the best validation accuracy and conducted a prediction run on the test set from the\u00a0Zalando MNIST. These files are available on Zenodo:\u00a0Inference Models\n\n\n\n5.\u00a0 References\n\n\n\n\t\n[1]\u00a0S. Deitsch, V. Christlein, S. Berger, C. Buerhop-Lutz, A. Maier, F. Gallwitz, and C. Riess, \u201cAutomatic classification of defective photovoltaic module cells in electroluminescence images,\u201d Solar Energy, vol. 185, p. 455\u2013468, 06-2019\n\t\n[2]\u00a0M. Rahimzadeh and A. Attar, \u201cA modified deep convolutional neural network for detecting COVID-19 and pneumonia from chest X-ray images based on the concatenation of Xception and ResNet50V2,\u201d Informatics in MedicineUnlocked, vol. 19, p. 100360, 2020.\n\t\n[3]\u00a0H. Vo, \u201cRealization and Verification of Deep Learning Models for FaultDetection and Diagnosis of Photovoltaic Modules,\u201d Master\u2019s Thesis, Aalto University. School of Electrical Engineering, 2021.\n\t\n[4]\u00a0P. Warden, \"Why GEMM is at the heart of deep learning,\" Pete Warden's Blog, 2015. Available at:\u00a0https://petewarden.com/2015/04/20/why-gemm-is-at-the-heart-of-deep-learning/\n\t\n[5]\u00a0Baidu Research, \"Benchmarking Deep Learning operations on different hardware\". Available at:\u00a0https://github.com/baidu-research/DeepBench\n\t\n[6]\u00a0Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, \"Gradient-based learning applied to document recognition,\" Proceedings of the IEEE, 1998.\n\t\n[7]\u00a0Xiao, K. Rasul, and R. Vollgraf, \u201cA Novel Image Dataset for Benchmarking Machine Learning Algorithms,\u201d 2017.\u00a0https://github.com/zalandoresearch/fashion-mnist\n\t\n[8]\u00a0F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion,O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vander-plas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay,\u201cScikit-learn: Machine learning in Python,\u201d Journal of Machine Learning Research, vol. 12, pp. 2825\u20132830, 2011.\n\t\n[9]\u00a0F. Chollet, \u201cKeras,\u201d 2015. Available at:\u00a0https://github.com/fchollet/keras\n\t\n[10]\u00a0ML Commons. Available at:\u00a0https://mlcommons.org/en/\n\t\n[11] W. Dai and D. Berleant, \u201cBenchmarking contemporary deep learning hardware and frameworks: A survey of qualitative metrics,\u201d 2019 IEEE First International Conference on Cognitive Machine Intelligence (CogMI), Dec 2019.\n\n\n\n\u00a0",
  "author": [
    {
      "family": "Vo",
      "given": "Huynh Quang Nguyen"
    }
  ],
  "id": "4905213",
  "issued": {
    "date-parts": [
      [
        "2021",
        "06",
        "07"
      ]
    ]
  },
  "publisher": "Zenodo",
  "title": "Hardware Benchmark for Deep Learning Capability",
  "type": "software",
  "version": "1.00.00"
}