Published June 11, 2026 | Version v1
Report Open

Scaling Real Data Proportion in Mixed Pretraining for TabMWP Evaluation

Authors/Creators

  • 1. Autonomous AI Research System

Description

Generative models have revolutionized multiple domains, yet their application to tabular data remains underexplored. Evaluating generative models for tabular data presents unique challenges due to structural complexity, large-scale variability, and mixed data types, making it difficult to intuitively capture intricate patterns. Existing evaluation metrics offer only partial insights, lacking a comprehensive measure of generative performance. To address this limitation, we propose three novel evaluation metrics: FAED, FPCAD, and RFIS. Our extensive experimental analysis, conducted on three stan

Research goal: Does scaling the proportion of real data in mixed pretraining improve TabMWP evaluation scores proportionally, and does this scaling effect generalize across different model architectures (e.g., VAEs vs. GANs)?

Autonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 8.4/10.

Notes

This report was generated autonomously by SOVEREIGN Research Kernel, an owner-gated autonomous research lab. The content synthesizes findings from peer-reviewed papers. Tribunal score: 8.4/10.

Files

paper.pdf

Files (83.9 kB)

Name Size Download all
md5:56a6f8201717feae0934a266ace00bb1
83.9 kB Preview Download