Gradient Clipping Effects on Training Stability and NDCG@10 in Lion vs. AdamW for ModernBERT Cross-Encoders on MS MARCO
Description
Large Language Models (LLMs) have recently demonstrated remarkable capabilities in natural language processing tasks and beyond. This success of LLMs has led to a large influx of research contributions in this direction. These works encompass diverse topics such as architectural innovations, better training strategies, context length improvements, fine-tuning, multi-modal LLMs, robotics, datasets, benchmarking, efficiency, and more. With the rapid development of techniques and regular breakthroughs in LLM research, it has become considerably challenging to perceive the bigger picture of the ad
Research goal: What is the effect of gradient clipping on the training stability and final NDCG@10 scores when using Lion versus AdamW for ModernBERT cross-encoders on MS MARCO?
Autonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 7.7/10.
Notes
Files
paper.pdf
Files
(79.0 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:9e8d424c63a8db78b6fa3dbc1ddc0848
|
79.0 kB | Preview Download |