Investigating a Second-Order Optimization Strategy for Neural Networks

Bermeitinger, Bernhard

doi:10.5281/zenodo.11091505

Published April 18, 2024 | Version v1

Dissertation Open

Investigating a Second-Order Optimization Strategy for Neural Networks

Bermeitinger, Bernhard (Researcher)¹

1. University of St. Gallen

In summary, this cumulative dissertation investigates the application of the conjugate gradient method CG for the optimization of artificial neural networks (NNs) and compares this method with common first-order optimization methods, especially the stochastic gradient descent (SGD).

The presented research results show that CG can effectively optimize both small and very large networks. However, the default machine precision of 32 bits can lead to problems. The best results are only achieved in 64-bits computations. The research also emphasizes the importance of the initialization of the NNs’ trainable parameters and shows that an initialization using singular value decomposition (SVD) leads to drastically lower error values. Surprisingly, shallow but wide NNs, both in Transformer and CNN architectures, often perform better than their deeper counterparts. Overall, the research results recommend a re-evaluation of the previous preference for extremely deep NNs and emphasize the potential of CG as an optimization method.

Files

Dissertation_Bermeitinger.pdf

Files (631.9 kB)

Name	Size	Download all
Dissertation_Bermeitinger.pdf md5:1525cde5aadd1c480bd0614b42549772	631.9 kB	Preview Download

Additional details

URN: urn:nbn:de:bvb:739-opus4-14087

Valid: 2024-04-18

Citations

Oops! Something went wrong while fetching results.

	All versions	This version
Views	53	53
Downloads	25	25
Data volume	23.4 MB	23.4 MB

Investigating a Second-Order Optimization Strategy for Neural Networks

Files

Dissertation_Bermeitinger.pdf

Files (631.9 kB)

Additional details

Identifiers

Dates

Investigating a Second-Order Optimization Strategy for Neural Networks

Creators

Description

Files

Dissertation_Bermeitinger.pdf

Files (631.9 kB)

Additional details

Identifiers

Dates