Robustness of TabMWP Evaluation Scores Across Pretraining Data Types Under Adversarial and Noisy Conditions
Description
Large Language Models offer new opportunities to devise automated implementation generation methods that can tackle problem solving activities beyond traditional methods, which require algorithmic specifications and can use only static domain knowledge, like performance metrics and libraries of basic building blocks. Large Language Models could support creating new methods to support problem solving activities for open-ended problems, like problem framing, exploring possible solving approaches, feature elaboration and combination, more advanced implementation assessment, and handling unexpecte
Research goal: How does the robustness of TabMWP evaluation scores vary between models pretrained on synthetic-only versus mixed data when exposed to adversarial tabular inputs or noisy real-world datasets?
Autonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 8.5/10.
Notes
Files
paper.pdf
Files
(81.2 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:d4cf35b4e22b62ec65f61d3605590bde
|
81.2 kB | Preview Download |