Published April 18, 2024 | Version v1
Dissertation Open

Investigating a Second-Order Optimization Strategy for Neural Networks

  • 1. University of St. Gallen

Description

In summary, this cumulative dissertation investigates the application of the conjugate gradient method CG for the optimization of artificial neural networks (NNs) and compares this method with common first-order optimization methods, especially the stochastic gradient descent (SGD).

 

The presented research results show that CG can effectively optimize both small and very large networks. However, the default machine precision of 32 bits can lead to problems. The best results are only achieved in 64-bits computations. The research also emphasizes the importance of the initialization of the NNs’ trainable parameters and shows that an initialization using singular value decomposition (SVD) leads to drastically lower error values. Surprisingly, shallow but wide NNs, both in Transformer and CNN architectures, often perform better than their deeper counterparts. Overall, the research results recommend a re-evaluation of the previous preference for extremely deep NNs and emphasize the potential of CG as an optimization method.

Files

Dissertation_Bermeitinger.pdf

Files (631.9 kB)

Name Size Download all
md5:1525cde5aadd1c480bd0614b42549772
631.9 kB Preview Download

Additional details

Dates

Valid
2024-04-18