Software Open Access

Hardware Benchmark for Deep Learning Capability

Vo, Huynh Quang Nguyen


MARC21 XML Export

<?xml version='1.0' encoding='UTF-8'?>
<record xmlns="http://www.loc.gov/MARC21/slim">
  <leader>00000nmm##2200000uu#4500</leader>
  <datafield tag="999" ind1="C" ind2="5">
    <subfield code="x">S. Deitsch, V. Christlein, S. Berger, C. Buerhop-Lutz, A. Maier, F. Gallwitz, and C. Riess, "Automatic classification of defective photovoltaic module cells in electroluminescence images," Solar Energy, vol. 185, p. 455–468, 06-2019</subfield>
  </datafield>
  <datafield tag="999" ind1="C" ind2="5">
    <subfield code="x">M. Rahimzadeh and A. Attar, "A modified deep convolutional neural network for detecting COVID-19 and pneumonia from chest X-ray images based on the concatenation of Xception and ResNet50V2," Informatics in MedicineUnlocked, vol. 19, p. 100360, 2020.</subfield>
  </datafield>
  <datafield tag="999" ind1="C" ind2="5">
    <subfield code="x">H. Vo, "Realization and Verification of Deep Learning Models for FaultDetection and Diagnosis of Photovoltaic Modules," Master's Thesis, Aalto University. School of Electrical Engineering, 2021.</subfield>
  </datafield>
  <datafield tag="999" ind1="C" ind2="5">
    <subfield code="x">P. Warden, "Why GEMM is at the heart of deep learning," Pete Warden's Blog, 2015. Available at: https://petewarden.com/2015/04/20/why-gemm-is-at-the-heart-of-deep-learning/</subfield>
  </datafield>
  <datafield tag="999" ind1="C" ind2="5">
    <subfield code="x">Baidu Research, "Benchmarking Deep Learning operations on different hardware". Available at: https://github.com/baidu-research/DeepBench</subfield>
  </datafield>
  <datafield tag="999" ind1="C" ind2="5">
    <subfield code="x">Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, 1998.</subfield>
  </datafield>
  <datafield tag="999" ind1="C" ind2="5">
    <subfield code="x">Xiao, K. Rasul, and R. Vollgraf, "A Novel Image Dataset for Benchmarking Machine Learning Algorithms," 2017. https://github.com/zalandoresearch/fashion-mnist</subfield>
  </datafield>
  <datafield tag="999" ind1="C" ind2="5">
    <subfield code="x">F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion,O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vander-plas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay,"Scikit-learn: Machine learning in Python," Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011.</subfield>
  </datafield>
  <datafield tag="999" ind1="C" ind2="5">
    <subfield code="x">F. Chollet, "Keras," 2015. Available at: https://github.com/fchollet/keras</subfield>
  </datafield>
  <datafield tag="999" ind1="C" ind2="5">
    <subfield code="x">ML Commons. Availabke at: https://mlcommons.org/en/</subfield>
  </datafield>
  <datafield tag="999" ind1="C" ind2="5">
    <subfield code="x">W. Dai and D. Berleant, "Benchmarking contemporary deep learn-ing hardware and frameworks: A survey of qualitative metrics," 2019 IEEE First International Conference on Cognitive Machine Intelligence (CogMI), Dec 2019.</subfield>
  </datafield>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">deep learning</subfield>
  </datafield>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">hardware</subfield>
  </datafield>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">benchmarking</subfield>
  </datafield>
  <controlfield tag="005">20210608014904.0</controlfield>
  <controlfield tag="001">4905213</controlfield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">RMIT Vietnam</subfield>
    <subfield code="4">rtm</subfield>
    <subfield code="a">Nguyen, Vinh Khuong</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">4597</subfield>
    <subfield code="z">md5:6fd4a8503fc2cb528954b31920323ef9</subfield>
    <subfield code="u">https://zenodo.org/record/4905213/files/Benchmark Inference.ipynb</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">11393</subfield>
    <subfield code="z">md5:e91d2124420ad7e9fb190c605b0bdad8</subfield>
    <subfield code="u">https://zenodo.org/record/4905213/files/Benchmark Simple.ipynb</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">16441</subfield>
    <subfield code="z">md5:d3469cc53a135d2b566892c44773aa1e</subfield>
    <subfield code="u">https://zenodo.org/record/4905213/files/Benchmark Type A.ipynb</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">15539</subfield>
    <subfield code="z">md5:c1ae3106f4e1f1b3c1bf9a301aa66c9f</subfield>
    <subfield code="u">https://zenodo.org/record/4905213/files/Benchmark Type B.ipynb</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">12030</subfield>
    <subfield code="z">md5:13012a2718bfcd3343955ebaa9d5004a</subfield>
    <subfield code="u">https://zenodo.org/record/4905213/files/Installation Guide.ipynb</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">202813960</subfield>
    <subfield code="z">md5:70c86da2e2050697981862f803ce79f6</subfield>
    <subfield code="u">https://zenodo.org/record/4905213/files/Model A-benchmarked.hdf5</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">1417895536</subfield>
    <subfield code="z">md5:b0c20471fa9b14c800cf66e311b8e031</subfield>
    <subfield code="u">https://zenodo.org/record/4905213/files/Model B-benchmarked.hdf5</subfield>
  </datafield>
  <datafield tag="542" ind1=" " ind2=" ">
    <subfield code="l">open</subfield>
  </datafield>
  <datafield tag="260" ind1=" " ind2=" ">
    <subfield code="c">2021-06-07</subfield>
  </datafield>
  <datafield tag="909" ind1="C" ind2="O">
    <subfield code="p">software</subfield>
    <subfield code="o">oai:zenodo.org:4905213</subfield>
  </datafield>
  <datafield tag="100" ind1=" " ind2=" ">
    <subfield code="u">Aalto University</subfield>
    <subfield code="a">Vo, Huynh Quang Nguyen</subfield>
  </datafield>
  <datafield tag="245" ind1=" " ind2=" ">
    <subfield code="a">Hardware Benchmark for Deep Learning Capability</subfield>
  </datafield>
  <datafield tag="540" ind1=" " ind2=" ">
    <subfield code="u">https://creativecommons.org/licenses/by/4.0/legalcode</subfield>
    <subfield code="a">Creative Commons Attribution 4.0 International</subfield>
  </datafield>
  <datafield tag="650" ind1="1" ind2="7">
    <subfield code="a">cc-by</subfield>
    <subfield code="2">opendefinition.org</subfield>
  </datafield>
  <datafield tag="520" ind1=" " ind2=" ">
    <subfield code="a">&lt;p&gt;1. Introduction&lt;/p&gt;

&lt;p&gt;These files contain the proposed implementation for benchmarking to evaluate whether a setup of hardware is feasible for complex deep learning projects.&lt;/p&gt;

&lt;p&gt;2. Scope&amp;nbsp;&lt;/p&gt;

&lt;ul&gt;
	&lt;li&gt;The benchmark evaluates the performance of a setup having a single CPU, a single GPU, RAM and memory storage. The performance of multi-CPUs/multi-GPUs or server-based is included in our scope.&lt;/li&gt;
	&lt;li&gt;The benchmark is built on the&amp;nbsp;&lt;strong&gt;Anaconda&lt;/strong&gt;&amp;nbsp;distribution of Python, and the&amp;nbsp;&lt;strong&gt;Jupyter Notebook&lt;/strong&gt;&amp;nbsp;computational environment. The deep learning models mentioned in this benchmarked are implemented using the&amp;nbsp;&lt;strong&gt;Keras&lt;/strong&gt;&amp;nbsp;application programming interface (API).&lt;/li&gt;
	&lt;li&gt;Our goal is to develop a verified approach to conduct the hardware benchmark that is quick and easy to use.&amp;nbsp;To do so, we provide benchmarking programs as well as the installation guide for Anaconda and deep learning-supported packages.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;3. Evaluation metrics&lt;/p&gt;

&lt;p&gt;&amp;nbsp;There are various metrics to benchmark the performance capabilities of a setup for deep learning purposes. Here, the following metrics are used:&lt;/p&gt;

&lt;ol&gt;
	&lt;li&gt;&lt;strong&gt;Total execution time&lt;/strong&gt;: the&amp;nbsp;&lt;strong&gt;total execution time&lt;/strong&gt;&amp;nbsp;includes both the&amp;nbsp;&lt;strong&gt;total training time&lt;/strong&gt;&amp;nbsp;and the&amp;nbsp;&lt;strong&gt;total validation time&lt;/strong&gt;&amp;nbsp;of a deep learning model on a dataset after a defined number of epochs. Here, the number of epochs is 100. The lower the&amp;nbsp;&lt;strong&gt;total execution time&lt;/strong&gt;&amp;nbsp;the better.&lt;/li&gt;
	&lt;li&gt;&lt;strong&gt;Total inference time&lt;/strong&gt;: the&amp;nbsp;&lt;strong&gt;total inference time&lt;/strong&gt;&amp;nbsp;includes both the&amp;nbsp;&lt;strong&gt;model loading time&lt;/strong&gt;&amp;nbsp;(the time required to fully load a set of pre-trained weights to implement a model) and the&amp;nbsp;&lt;strong&gt;total prediction time&lt;/strong&gt;&amp;nbsp;of a deep learning model on a test dataset. Similar to the&amp;nbsp;&lt;strong&gt;total execution time&lt;/strong&gt;, the lower the&amp;nbsp;&lt;strong&gt;total inference time&lt;/strong&gt;&amp;nbsp;the better.&lt;/li&gt;
	&lt;li&gt;&lt;strong&gt;FLOPS&lt;/strong&gt;: the performance capability of a CPU or GPU can be measured by counting the number of floating operation points (FLO) it can execute per second. Thus, the higher the&amp;nbsp;&lt;strong&gt;FLOPS&lt;/strong&gt;, the better.&lt;/li&gt;
	&lt;li&gt;&lt;strong&gt;Computing resources issues/errors&lt;/strong&gt;: Ideally, a better-performed setup will not encounter any computing resources issues/errors including but not limited to the Out-Of-Memory (OOM) error.&lt;/li&gt;
	&lt;li&gt;&lt;strong&gt;Bottlenecking&lt;/strong&gt;: to put it simply, bottlenecking is a subpar performance that is caused by the inability of one component to keep up with the others, thus slowing down the overall ability of a setup to process data. Here, our primary concern is the bottlenecking between CPU and GPU. The&amp;nbsp;&lt;strong&gt;bottlenecking factor&lt;/strong&gt;&amp;nbsp;is measured using an online tool:&amp;nbsp;&lt;a href="https://pc-builds.com/calculator/"&gt;Bottleneck Calculator&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&amp;nbsp;4. Methods&lt;/p&gt;

&lt;ul&gt;
	&lt;li&gt;To evaluate the hardware performance, two deep learning models are deployed for benchmarking purpose. The first model is a modified VGG19 based on a study by Deitsch et al. (&lt;strong&gt;Model A&lt;/strong&gt;) [1], and the other model is a modified concatenated model proposed in a study from Rahimzadeh et al. (&lt;strong&gt;Model B&lt;/strong&gt;) [2]. These models were previously implemented in Vo et al [3]. The model compilation, training and validation practices are similar to those mentioned in Vo et al [3]. Besides, several optimization practices such as mixed precision policy are applied for model training to make it run faster and consume less memory. The following datasets are used&amp;nbsp;for benchmarking: the&amp;nbsp;&lt;strong&gt;original MNIST dataset&lt;/strong&gt;&amp;nbsp;by LeCun et al., and the&amp;nbsp;&lt;strong&gt;Zalando MNIST dataset&lt;/strong&gt;&amp;nbsp;by Xiao et al.&lt;/li&gt;
	&lt;li&gt;On the other hand, we also proposed another approach for benchmarking that is much simpler and quicker: evaluating the&amp;nbsp;&lt;strong&gt;total execution time&lt;/strong&gt;&amp;nbsp;for a combination of basic operations. These basic operations include General Matrix to Matrix Multiplication (GEMM), 2D-Convolution (Convolve2D) and Recurrent Neural Network (RNN), and exist in almost all deep neural networks today [4]. We implemented our alternative approach based on the DeepBench work by Baidu [5]:
	&lt;ul&gt;
		&lt;li&gt;In DMM, we defined matrix C as a product of&amp;nbsp;(MxN)&amp;nbsp;and&amp;nbsp;(NxK)&amp;nbsp;matrices. For example,&amp;nbsp;(3072,128,1024)&amp;nbsp;means the resulting matrix is a product of&amp;nbsp;(3072x128)&amp;nbsp;and&amp;nbsp;(128x1024)&amp;nbsp;matrices. To benchmark, we implemented five different multiplications and measured the overall&amp;nbsp;&lt;strong&gt;total execution time&lt;/strong&gt;&amp;nbsp;of these five. These multiplications included&amp;nbsp;(3072,128,1024),&amp;nbsp;(5124,9124,2560),&amp;nbsp;(2560,64,2560),&amp;nbsp;(7860,64,2560), and&amp;nbsp;(1760,128,1760).&lt;/li&gt;
		&lt;li&gt;In SMM, we defined matrix C as a product of&amp;nbsp;(MxN)&amp;nbsp;and&amp;nbsp;(NxK)&amp;nbsp;matrices, and&amp;nbsp;(100 - Dx100)%&amp;nbsp;of the&amp;nbsp;(MxN)&amp;nbsp;matrix is omitted. For instance,&amp;nbsp;(10752,1,3584,0.9)&amp;nbsp;means the resulting matrix is a product of&amp;nbsp;(10752x1)&amp;nbsp;and&amp;nbsp;(1x3584)&amp;nbsp;matrices, while 10% of the&amp;nbsp;(10752x1)&amp;nbsp;matrix is omitted. To benchmark, we implemented four different multiplications and measured the overall&amp;nbsp;&lt;strong&gt;total execution time&lt;/strong&gt;&amp;nbsp;of these five. These multiplications included&amp;nbsp;(10752,1,3584,0.9),&amp;nbsp;(7680,1500,2560,0.95),&amp;nbsp;(7680,2,2560,0.95), and&amp;nbsp;(7680,1,2560,0.95).&lt;/li&gt;
		&lt;li&gt;In Convolve2D, we defined a simple model containing only convolution layers and pooling layers&amp;nbsp;and measured the resulting&amp;nbsp;&lt;strong&gt;total execution time&lt;/strong&gt;. The dataset used for this training this model is the&amp;nbsp;&lt;strong&gt;Zalando MNIST&lt;/strong&gt;&amp;nbsp;by Xiao et al.&lt;/li&gt;
		&lt;li&gt;We did not implement the&amp;nbsp;&lt;strong&gt;RNN&lt;/strong&gt;&amp;nbsp;due to several issues caused by the new version of Keras.&lt;/li&gt;
	&lt;/ul&gt;
	&lt;/li&gt;
	&lt;li&gt;To evaluate&amp;nbsp;&lt;strong&gt;total inference time&lt;/strong&gt;, we loaded the already trained weights from our models (denoted as&amp;nbsp;&lt;strong&gt;Model A-benchmarked&lt;/strong&gt;&amp;nbsp;and&amp;nbsp;&lt;strong&gt;Model B-benchmarked&lt;/strong&gt;, respectively) which has the best validation accuracy and conducted a prediction run on the test set from the&amp;nbsp;&lt;strong&gt;Zalando MNIST&lt;/strong&gt;. These files are available on Zenodo:&amp;nbsp;&lt;a href="https://zenodo.org/record/4905213#.YL1-P_kzaUk"&gt;Inference Models&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;5.&amp;nbsp; References&lt;/p&gt;

&lt;ul&gt;
	&lt;li&gt;[1]&amp;nbsp;S. Deitsch, V. Christlein, S. Berger, C. Buerhop-Lutz, A. Maier, F. Gallwitz, and C. Riess, &amp;ldquo;Automatic classification of defective photovoltaic module cells in electroluminescence images,&amp;rdquo; Solar Energy, vol. 185, p. 455&amp;ndash;468, 06-2019&lt;/li&gt;
	&lt;li&gt;[2]&amp;nbsp;M. Rahimzadeh and A. Attar, &amp;ldquo;A modified deep convolutional neural network for detecting COVID-19 and pneumonia from chest X-ray images based on the concatenation of Xception and ResNet50V2,&amp;rdquo; Informatics in MedicineUnlocked, vol. 19, p. 100360, 2020.&lt;/li&gt;
	&lt;li&gt;[3]&amp;nbsp;H. Vo, &amp;ldquo;Realization and Verification of Deep Learning Models for FaultDetection and Diagnosis of Photovoltaic Modules,&amp;rdquo; Master&amp;rsquo;s Thesis, Aalto University. School of Electrical Engineering, 2021.&lt;/li&gt;
	&lt;li&gt;[4]&amp;nbsp;P. Warden, &amp;quot;Why GEMM is at the heart of deep learning,&amp;quot; Pete Warden&amp;#39;s Blog, 2015. Available at:&amp;nbsp;&lt;a href="https://petewarden.com/2015/04/20/why-gemm-is-at-the-heart-of-deep-learning/"&gt;https://petewarden.com/2015/04/20/why-gemm-is-at-the-heart-of-deep-learning/&lt;/a&gt;&lt;/li&gt;
	&lt;li&gt;[5]&amp;nbsp;Baidu Research, &amp;quot;Benchmarking Deep Learning operations on different hardware&amp;quot;. Available at:&amp;nbsp;&lt;a href="https://github.com/baidu-research/DeepBench"&gt;https://github.com/baidu-research/DeepBench&lt;/a&gt;&lt;/li&gt;
	&lt;li&gt;[6]&amp;nbsp;Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, &amp;quot;Gradient-based learning applied to document recognition,&amp;quot; Proceedings of the IEEE, 1998.&lt;/li&gt;
	&lt;li&gt;[7]&amp;nbsp;Xiao, K. Rasul, and R. Vollgraf, &amp;ldquo;A Novel Image Dataset for Benchmarking Machine Learning Algorithms,&amp;rdquo; 2017.&amp;nbsp;&lt;a href="https://github.com/zalandoresearch/fashion-mnist"&gt;https://github.com/zalandoresearch/fashion-mnist&lt;/a&gt;&lt;/li&gt;
	&lt;li&gt;[8]&amp;nbsp;F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion,O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vander-plas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay,&amp;ldquo;Scikit-learn: Machine learning in Python,&amp;rdquo; Journal of Machine Learning Research, vol. 12, pp. 2825&amp;ndash;2830, 2011.&lt;/li&gt;
	&lt;li&gt;[9]&amp;nbsp;F. Chollet, &amp;ldquo;Keras,&amp;rdquo; 2015. Available at:&amp;nbsp;&lt;a href="https://github.com/fchollet/keras"&gt;https://github.com/fchollet/keras&lt;/a&gt;&lt;/li&gt;
	&lt;li&gt;[10]&amp;nbsp;ML Commons. Available at:&amp;nbsp;&lt;a href="https://mlcommons.org/en/"&gt;https://mlcommons.org/en/&lt;/a&gt;&lt;/li&gt;
	&lt;li&gt;[11] W. Dai and D. Berleant, &amp;ldquo;Benchmarking contemporary deep learning hardware and frameworks: A survey of qualitative metrics,&amp;rdquo; 2019 IEEE First International Conference on Cognitive Machine Intelligence (CogMI), Dec 2019.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&amp;nbsp;&lt;/p&gt;

&lt;ul&gt;
&lt;/ul&gt;</subfield>
  </datafield>
  <datafield tag="773" ind1=" " ind2=" ">
    <subfield code="n">doi</subfield>
    <subfield code="i">isVersionOf</subfield>
    <subfield code="a">10.5281/zenodo.4905212</subfield>
  </datafield>
  <datafield tag="024" ind1=" " ind2=" ">
    <subfield code="a">10.5281/zenodo.4905213</subfield>
    <subfield code="2">doi</subfield>
  </datafield>
  <datafield tag="980" ind1=" " ind2=" ">
    <subfield code="a">software</subfield>
  </datafield>
</record>
25
0
views
downloads
All versions This version
Views 2525
Downloads 00
Data volume 0 Bytes0 Bytes
Unique views 1616
Unique downloads 00

Share

Cite as