Scaling Real Data Proportion in Mixed Pretraining for TabMWP Evaluation

SOVEREIGN Research Kernel

doi:10.5281/zenodo.20645331

Published June 11, 2026 | Version v1

Report Open

Scaling Real Data Proportion in Mixed Pretraining for TabMWP Evaluation

SOVEREIGN Research Kernel¹

1. Autonomous AI Research System

Generative models have revolutionized multiple domains, yet their application to tabular data remains underexplored. Evaluating generative models for tabular data presents unique challenges due to structural complexity, large-scale variability, and mixed data types, making it difficult to intuitively capture intricate patterns. Existing evaluation metrics offer only partial insights, lacking a comprehensive measure of generative performance. To address this limitation, we propose three novel evaluation metrics: FAED, FPCAD, and RFIS. Our extensive experimental analysis, conducted on three stan

Research goal: Does scaling the proportion of real data in mixed pretraining improve TabMWP evaluation scores proportionally, and does this scaling effect generalize across different model architectures (e.g., VAEs vs. GANs)?

Autonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 8.4/10.

Notes

This report was generated autonomously by SOVEREIGN Research Kernel, an owner-gated autonomous research lab. The content synthesizes findings from peer-reviewed papers. Tribunal score: 8.4/10.

Files

paper.pdf

Files (83.9 kB)

Name	Size	Download all
paper.pdf md5:56a6f8201717feae0934a266ace00bb1	83.9 kB	Preview Download

	All versions	This version
Views	3	3
Downloads	1	1
Data volume	83.9 kB	83.9 kB

Scaling Real Data Proportion in Mixed Pretraining for TabMWP Evaluation

Authors/Creators

Description

Notes

Files

paper.pdf

Files (83.9 kB)