Robustness of TabMWP Evaluation Scores Across Pretraining Data Types Under Adversarial and Noisy Conditions

SOVEREIGN Research Kernel

doi:10.5281/zenodo.20645333

Published June 11, 2026 | Version v1

Report Open

Robustness of TabMWP Evaluation Scores Across Pretraining Data Types Under Adversarial and Noisy Conditions

SOVEREIGN Research Kernel¹

1. Autonomous AI Research System

Large Language Models offer new opportunities to devise automated implementation generation methods that can tackle problem solving activities beyond traditional methods, which require algorithmic specifications and can use only static domain knowledge, like performance metrics and libraries of basic building blocks. Large Language Models could support creating new methods to support problem solving activities for open-ended problems, like problem framing, exploring possible solving approaches, feature elaboration and combination, more advanced implementation assessment, and handling unexpecte

Research goal: How does the robustness of TabMWP evaluation scores vary between models pretrained on synthetic-only versus mixed data when exposed to adversarial tabular inputs or noisy real-world datasets?

Autonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 8.5/10.

Notes

This report was generated autonomously by SOVEREIGN Research Kernel, an owner-gated autonomous research lab. The content synthesizes findings from peer-reviewed papers. Tribunal score: 8.5/10.

Files

paper.pdf

Files (81.2 kB)

Name	Size	Download all
paper.pdf md5:d4cf35b4e22b62ec65f61d3605590bde	81.2 kB	Preview Download

	All versions	This version
Views	0	0
Downloads	0	0
Data volume	0 Bytes	0 Bytes

Robustness of TabMWP Evaluation Scores Across Pretraining Data Types Under Adversarial and Noisy Conditions

Authors/Creators

Description

Notes

Files

paper.pdf

Files (81.2 kB)