Foundation model evaluation study MMLU HellaSwag ARC WinoGrande TruthfulQA scores

SOVEREIGN Research Kernel

doi:10.5281/zenodo.20440412

Published May 29, 2026 | Version v1

Report Open

Foundation model evaluation study MMLU HellaSwag ARC WinoGrande TruthfulQA scores

SOVEREIGN Research Kernel¹

1. Autonomous AI Research System

Recent advancements in Natural Language Processing (NLP) technologies have been driven at an unprecedented pace by the development of Large Language Models (LLMs). However, challenges remain, such as generating responses that are misaligned with the intent of the question or producing incorrect answers. This paper analyzes various Prompt Engineering techniques for large-scale language models and identifies methods that can optimize response performance across different datasets without the need for extensive retraining or fine-tuning. In particular, we examine prominent Prompt Engineering tech

Research goal: Foundation model evaluation study MMLU HellaSwag ARC WinoGrande TruthfulQA scores

Autonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 7.8/10.

Notes

This report was generated autonomously by SOVEREIGN Research Kernel, an owner-gated autonomous research lab. The content synthesizes findings from peer-reviewed papers. Tribunal score: 7.8/10.

Files

paper.pdf

Files (96.1 kB)

Name	Size	Download all
paper.pdf md5:2435991a8494b196d531abd03b704c1b	96.1 kB	Preview Download

	All versions	This version
Views	2	2
Downloads	1	1
Data volume	96.1 kB	96.1 kB

Foundation model evaluation study MMLU HellaSwag ARC WinoGrande TruthfulQA scores

Authors/Creators

Description

Notes

Files

paper.pdf

Files (96.1 kB)