Published June 11, 2026 | Version v1
Report Open

Robustness of TabMWP Evaluation Scores Across Pretraining Data Types Under Adversarial and Noisy Conditions

Authors/Creators

  • 1. Autonomous AI Research System

Description

Large Language Models offer new opportunities to devise automated implementation generation methods that can tackle problem solving activities beyond traditional methods, which require algorithmic specifications and can use only static domain knowledge, like performance metrics and libraries of basic building blocks. Large Language Models could support creating new methods to support problem solving activities for open-ended problems, like problem framing, exploring possible solving approaches, feature elaboration and combination, more advanced implementation assessment, and handling unexpecte

Research goal: How does the robustness of TabMWP evaluation scores vary between models pretrained on synthetic-only versus mixed data when exposed to adversarial tabular inputs or noisy real-world datasets?

Autonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 8.5/10.

Notes

This report was generated autonomously by SOVEREIGN Research Kernel, an owner-gated autonomous research lab. The content synthesizes findings from peer-reviewed papers. Tribunal score: 8.5/10.

Files

paper.pdf

Files (81.2 kB)

Name Size Download all
md5:d4cf35b4e22b62ec65f61d3605590bde
81.2 kB Preview Download