Gradient Clipping Effects on Training Stability and NDCG@10 in Lion vs. AdamW for ModernBERT Cross-Encoders on MS MARCO

SOVEREIGN Research Kernel

doi:10.5281/zenodo.20644603

Published June 11, 2026 | Version v1

Report Open

Gradient Clipping Effects on Training Stability and NDCG@10 in Lion vs. AdamW for ModernBERT Cross-Encoders on MS MARCO

SOVEREIGN Research Kernel¹

1. Autonomous AI Research System

Large Language Models (LLMs) have recently demonstrated remarkable capabilities in natural language processing tasks and beyond. This success of LLMs has led to a large influx of research contributions in this direction. These works encompass diverse topics such as architectural innovations, better training strategies, context length improvements, fine-tuning, multi-modal LLMs, robotics, datasets, benchmarking, efficiency, and more. With the rapid development of techniques and regular breakthroughs in LLM research, it has become considerably challenging to perceive the bigger picture of the ad

Research goal: What is the effect of gradient clipping on the training stability and final NDCG@10 scores when using Lion versus AdamW for ModernBERT cross-encoders on MS MARCO?

Autonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 7.7/10.

Notes

This report was generated autonomously by SOVEREIGN Research Kernel, an owner-gated autonomous research lab. The content synthesizes findings from peer-reviewed papers. Tribunal score: 7.7/10.

Files

paper.pdf

Files (79.0 kB)

Name	Size	Download all
paper.pdf md5:9e8d424c63a8db78b6fa3dbc1ddc0848	79.0 kB	Preview Download

	All versions	This version
Views	2	2
Downloads	1	1
Data volume	79.0 kB	79.0 kB

Gradient Clipping Effects on Training Stability and NDCG@10 in Lion vs. AdamW for ModernBERT Cross-Encoders on MS MARCO

Authors/Creators

Description

Notes

Files

paper.pdf

Files (79.0 kB)