Published June 30, 2023 | Version CC BY-NC-ND 4.0
Journal article Open

Characterizing Adaptive Optimizer in CNN by Reverse Mode Differentiation from Full-Scratch

  • 1. National Institute of Informatics, 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo, Japan.
  • 2. Musashino University, Department of Data Science, 3-3-3 Ariake, Koto-Ku, Tokyo, Japan.

Contributors

Contact person:

  • 1. National Institute of Informatics, 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo, Japan.

Description

Abstract: Recently, datasets have been discovered for which adaptive optimizers are not more than adequate. No evaluation criteria have been established for optimization as to which algorithm is appropriate. In this paper, we propose a characterization method by implementing backward automatic differentiation and characterizes the optimizer by tracking the gradient and the value of the signal flowing to the output layer at each epoch. The proposed method was applied to a CNN (Convolutional Neural Network) recognizing CIFAR-10, and experiments were conducted comparing and Adam (adaptive moment estimation) and SGD (stochastic gradient descent). The experiments revealed that for batch sizes of 50, 100, 150, and 200, SGD and Adam significantly differ in the characteristics of the time series of signals sent to the output layer. This shows that the ADAM optimizer can be clearly characterized from the input signal series for each batch size.

Notes

Published By: Lattice Science Publication (LSP) © Copyright: All rights reserved.

Files

D1070063423.pdf

Files (684.7 kB)

Name Size Download all
md5:aa4827e57ecbcef58c5fa1c1f275782e
684.7 kB Preview Download

Additional details

Related works

Is cited by
Journal article: 2582-7626 (ISSN)

References

  • Ashia C. Wilson, Rebecca Roelofs, Mitchell Stern, Nathan Srebro, Benjamin Recht: The Marginal Value of Adaptive Gradient Methods in Machine Learning. CoRR abs/1705.08292 (2017)
  • Pytorch. https://github.com/pytorch/pytorch
  • Martin Abadi et al.: ``TensorFlow: A System for Large-Scale Machine Learning'', OSDI 2016: 265-283
  • Ando, R. and Takefuji", Y, "A constrained recursion algorithm for batch normalization of tree turctured lstm'', https://arxiv.org/abs/2008.09409
  • Andreas Veit, Michael J. Wilber, Serge J. Belongie: Residual Networks Behave Like Ensembles of Relatively Shallow Networks. NIPS 2016: 550-558
  • David E. Rumelhart, Geoffrey E. Hinton, Ronald J. Williams, Learning representations by back-propagating errors, Nature volume 323, pages533-536 (1986)
  • B.T. Polyak, Some methods of speeding up the convergence of iteration methods, USSR Computational Mathematics and Mathematical Physics Volume 4, Issue 5, 1964, Pages 1-17
  • Geoffrey Hinton Neural Networks for machine learning online course. \\ https://www.coursera.org/learn/neural-networks/home/welcome
  • Frdric Bastien, Pascal Lamblin, Razvan Pascanu, James Bergstra, Ian J. Goodfellow, Arnaud Bergeron, Nicolas Bouchard, David Warde-Farley, Yoshua Bengio: Theano: new features and speed improvements. CoRR abs/1211.5590 (2012)
  • Y-Lan Boureau, Nicolas Le Roux, Francis R. Bach, Jean Ponce, Yann LeCun: Ask the locals: Multi-way local pooling for image recognition. ICCV 2011: 2651-2658
  • Y-Lan Boureau, Jean Ponce, Yann LeCun: A Theoretical Analysis of Feature Pooling in Visual Recognition. ICML 2010: 111-118
  • Brownlee, J, "a gentle introduction to the rectified linear unit (relu)'', Machine Learning Mastery, 2021
  • Duchi, J., Hazan, E.Singer et al. "adaptive subgradient methods for online learning and stochastic optimization", Journal of Machine Learning Research, 2121-2159
  • Frosst, N. and Hinton, G. , ``distilling a neural network into a soft decision tree", https://arxiv.org/abs/1711.09784
  • Ioffe, S. and Szegedy, C, ``batch normalization: Accelerating deep network training by reducing internal covariate shift'',"arXiv:1502.03167. 2015
  • Jia, Shelhamer, Donahue, Karayev, Long, Girshick, Guadarrama. "caffe: Convolutional architecture for fast feature embedding", CoRR abs/1408.5093
  • Kingma, D.~P. ,Ba, "adam: A method for stochastic optimization", ICLR (Poster), 2015.
  • Yann LeCun, Lawrence D. Jackel, Bernhard E. Boser, John S. Denker, Hans Peter Graf, Isabelle Guyon, Don Henderson, Richard E. Howard, Wayne E. Hubbard: Handwritten digit recognition: applications of neural network chips and automatic learning. IEEE Commun. Mag. 27(11): 41-46 (1989)
  • Kyung Soo Kim, Yong Suk Choi: HyAdamC: A New Adam-Based Hybrid Optimization Algorithm for Convolution Neural Networks. Sensors 21(12): 4054 (2021)

Subjects

ISSN: 2582-7626 (Online)
https://portal.issn.org/resource/ISSN/2582-7626#
Retrieval Number: 100.1/ijainn.D1070063423
https://www.ijainn.latticescipub.com/portfolio-item/D1070063423/
Journal Website: www.ijainn.latticescipub.com
https://www.ijainn.latticescipub.com/
Publisher: Lattice Science Publication (LSP)
https://www.latticescipub.com/