Published September 19, 2025 | Version v1
Preprint Open

VANTA Research Reasoning Evaluation (VRRE): A Novel Framework for Semantic-Based LLM Reasoning Assessments

  • 1. VANTA Research

Contributors

Researcher:

  • 1. VANTA Research

Description

The VANTA Research Reasoning Evaluation (VRRE) introduces a novel, semantic-based framework for assessing large language model reasoning. Unlike traditional benchmarks such as BoolQ, PIQA, or ARC, which rely on binary scoring and format-dependent outputs, VRRE evaluates answers by meaning, reasoning fidelty, and faithfulness to context. 

In validation across multiple model architectures, VRRE detected a 2.5x improvement in reasoning ability that standard benchmarks failed to capture, demonstrating it's ability to reveal hidden model capabilities and failure modes. This approach makes VRRE especially relevant for alignment research, safety evaluations, and real-world deployment scenarios where surface correctiness is insufficient. 

VRRE is released under Apache 2.0 and developed by VANTA Research (Portland, Oregon) as part of it's mission to build aligned, reasoning-first AI evaluation methods. 

Files

VANTA Research Reasoning Evaluation (VRRE).pdf

Files (230.7 kB)

Name Size Download all
md5:eda3546063e856c77980b04ed5452f45
230.7 kB Preview Download

Additional details

Related works

Dates

Available
2025-09-19

Software

Repository URL
https://github.com/vanta-research/vrre
Programming language
Python, Shell
Development Status
Active