Investigating a Second-Order Optimization Strategy for Neural Networks
Description
In summary, this cumulative dissertation investigates the application of the conjugate gradient method CG for the optimization of artificial neural networks (NNs) and compares this method with common first-order optimization methods, especially the stochastic gradient descent (SGD).
The presented research results show that CG can effectively optimize both small and very large networks. However, the default machine precision of 32 bits can lead to problems. The best results are only achieved in 64-bits computations. The research also emphasizes the importance of the initialization of the NNs’ trainable parameters and shows that an initialization using singular value decomposition (SVD) leads to drastically lower error values. Surprisingly, shallow but wide NNs, both in Transformer and CNN architectures, often perform better than their deeper counterparts. Overall, the research results recommend a re-evaluation of the previous preference for extremely deep NNs and emphasize the potential of CG as an optimization method.
Files
Dissertation_Bermeitinger.pdf
Files
(631.9 kB)
Name | Size | Download all |
---|---|---|
md5:1525cde5aadd1c480bd0614b42549772
|
631.9 kB | Preview Download |
Additional details
Identifiers
Dates
- Valid
-
2024-04-18