Characterizing Adaptive Optimizer in CNN by Reverse Mode Differentiation from Full-Scratch

doi:10.54105/ijainn.D1070.063423

Published June 30, 2023 | Version CC BY-NC-ND 4.0

Journal article Open

Characterizing Adaptive Optimizer in CNN by Reverse Mode Differentiation from Full-Scratch

1. National Institute of Informatics, 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo, Japan.
2. Musashino University, Department of Data Science, 3-3-3 Ariake, Koto-Ku, Tokyo, Japan.

Contributors

Contact person:

Ruo Ando¹

1. National Institute of Informatics, 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo, Japan.

Abstract: Recently, datasets have been discovered for which adaptive optimizers are not more than adequate. No evaluation criteria have been established for optimization as to which algorithm is appropriate. In this paper, we propose a characterization method by implementing backward automatic differentiation and characterizes the optimizer by tracking the gradient and the value of the signal flowing to the output layer at each epoch. The proposed method was applied to a CNN (Convolutional Neural Network) recognizing CIFAR-10, and experiments were conducted comparing and Adam (adaptive moment estimation) and SGD (stochastic gradient descent). The experiments revealed that for batch sizes of 50, 100, 150, and 200, SGD and Adam significantly differ in the characteristics of the time series of signals sent to the output layer. This shows that the ADAM optimizer can be clearly characterized from the input signal series for each batch size.

Notes

Files

D1070063423.pdf

Files (684.7 kB)

Name	Size	Download all
D1070063423.pdf md5:aa4827e57ecbcef58c5fa1c1f275782e	684.7 kB	Preview Download

Additional details

Is cited by: Journal article: 2582-7626 (ISSN)

Ashia C. Wilson, Rebecca Roelofs, Mitchell Stern, Nathan Srebro, Benjamin Recht: The Marginal Value of Adaptive Gradient Methods in Machine Learning. CoRR abs/1705.08292 (2017)
Pytorch. https://github.com/pytorch/pytorch
Martin Abadi et al.: ``TensorFlow: A System for Large-Scale Machine Learning'', OSDI 2016: 265-283
Ando, R. and Takefuji", Y, "A constrained recursion algorithm for batch normalization of tree turctured lstm'', https://arxiv.org/abs/2008.09409
Andreas Veit, Michael J. Wilber, Serge J. Belongie: Residual Networks Behave Like Ensembles of Relatively Shallow Networks. NIPS 2016: 550-558
David E. Rumelhart, Geoffrey E. Hinton, Ronald J. Williams, Learning representations by back-propagating errors, Nature volume 323, pages533-536 (1986)
B.T. Polyak, Some methods of speeding up the convergence of iteration methods, USSR Computational Mathematics and Mathematical Physics Volume 4, Issue 5, 1964, Pages 1-17
Geoffrey Hinton Neural Networks for machine learning online course. \\ https://www.coursera.org/learn/neural-networks/home/welcome
Frdric Bastien, Pascal Lamblin, Razvan Pascanu, James Bergstra, Ian J. Goodfellow, Arnaud Bergeron, Nicolas Bouchard, David Warde-Farley, Yoshua Bengio: Theano: new features and speed improvements. CoRR abs/1211.5590 (2012)
Y-Lan Boureau, Nicolas Le Roux, Francis R. Bach, Jean Ponce, Yann LeCun: Ask the locals: Multi-way local pooling for image recognition. ICCV 2011: 2651-2658
Y-Lan Boureau, Jean Ponce, Yann LeCun: A Theoretical Analysis of Feature Pooling in Visual Recognition. ICML 2010: 111-118
Brownlee, J, "a gentle introduction to the rectified linear unit (relu)'', Machine Learning Mastery, 2021
Duchi, J., Hazan, E.Singer et al. "adaptive subgradient methods for online learning and stochastic optimization", Journal of Machine Learning Research, 2121-2159
Frosst, N. and Hinton, G. , ``distilling a neural network into a soft decision tree", https://arxiv.org/abs/1711.09784
Ioffe, S. and Szegedy, C, ``batch normalization: Accelerating deep network training by reducing internal covariate shift'',"arXiv:1502.03167. 2015
Jia, Shelhamer, Donahue, Karayev, Long, Girshick, Guadarrama. "caffe: Convolutional architecture for fast feature embedding", CoRR abs/1408.5093
Kingma, D.~P. ,Ba, "adam: A method for stochastic optimization", ICLR (Poster), 2015.
Yann LeCun, Lawrence D. Jackel, Bernhard E. Boser, John S. Denker, Hans Peter Graf, Isabelle Guyon, Don Henderson, Richard E. Howard, Wayne E. Hubbard: Handwritten digit recognition: applications of neural network chips and automatic learning. IEEE Commun. Mag. 27(11): 41-46 (1989)
Kyung Soo Kim, Yong Suk Choi: HyAdamC: A New Adam-Based Hybrid Optimization Algorithm for Convolution Neural Networks. Sensors 21(12): 4054 (2021)

ISSN: 2582-7626 (Online): https://portal.issn.org/resource/ISSN/2582-7626#
Retrieval Number: 100.1/ijainn.D1070063423: https://www.ijainn.latticescipub.com/portfolio-item/D1070063423/
Journal Website: www.ijainn.latticescipub.com: https://www.ijainn.latticescipub.com/
Publisher: Lattice Science Publication (LSP): https://www.latticescipub.com/

	All versions	This version
Views	51	51
Downloads	36	35
Data volume	26.0 MB	25.3 MB

Characterizing Adaptive Optimizer in CNN by Reverse Mode Differentiation from Full-Scratch

Contributors

Contact person:

Notes

Files

D1070063423.pdf

Files (684.7 kB)

Additional details

Related works

References

Subjects

Characterizing Adaptive Optimizer in CNN by Reverse Mode Differentiation from Full-Scratch

Creators

Contributors

Contact person:

Description

Notes

Files

D1070063423.pdf

Files (684.7 kB)

Additional details

Related works

References

Subjects