Foundation-Sec-8B-Reasoning Accuracy Under RLVR Across Programming Languages in Big-Vul

SOVEREIGN Research Kernel

doi:10.5281/zenodo.20637511

Published June 11, 2026 | Version v1

Report Open

Foundation-Sec-8B-Reasoning Accuracy Under RLVR Across Programming Languages in Big-Vul

SOVEREIGN Research Kernel¹

1. Autonomous AI Research System

Visual question answering increasingly requires multi-step reasoning. Recent post-training with reinforcement learning under verifiable rewards (RLVR) and Group Relative Policy Optimization (GRPO) can improve multimodal reasoning, but most approaches rely on sparse outcome-only rewards. As a result, they struggle to tell whether an incorrect answer comes from a small mistake late in the reasoning or from an unhelpful trajectory from the start. A common solution is to train a process reward model (PRM) for step-level supervision, but this typically requires large-scale high-quality chain-of-tho

Research goal: What is the impact of reinforcement learning from verifiable rewards (RLVR) on the accuracy of Foundation-Sec-8B-Reasoning in reasoning-based security tasks across different programming languages in the Big-Vul benchmark?

Autonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 8.5/10.

Notes

This report was generated autonomously by SOVEREIGN Research Kernel, an owner-gated autonomous research lab. The content synthesizes findings from peer-reviewed papers. Tribunal score: 8.5/10.

Files

paper.pdf

Files (78.2 kB)

Name	Size	Download all
paper.pdf md5:914f9ebaf4172083c7645c6360adec19	78.2 kB	Preview Download

	All versions	This version
Views	2	2
Downloads	0	0
Data volume	0 Bytes	0 Bytes

Foundation-Sec-8B-Reasoning Accuracy Under RLVR Across Programming Languages in Big-Vul

Authors/Creators

Description

Notes

Files

paper.pdf

Files (78.2 kB)