Scalability of Overlap-Aware Synthesis Methods for Large Tabular Datasets Regarding Training Time and FID

SOVEREIGN Research Kernel

doi:10.5281/zenodo.20654892

Published June 12, 2026 | Version v1

Report Open

Scalability of Overlap-Aware Synthesis Methods for Large Tabular Datasets Regarding Training Time and FID

SOVEREIGN Research Kernel¹

1. Autonomous AI Research System

Synthetic data generation has emerged as a promising solution to overcome the challenges which are posed by data scarcity and privacy concerns, as well as, to address the need for training artificial intelligence (AI) algorithms on unbiased data with sufficient sample size and statistical power. Our review explores the application and efficacy of synthetic data methods in healthcare considering the diversity of medical data. To this end, we systematically searched the PubMed and Scopus databases with a great focus on tabular, imaging, radiomics, time-series, and omics data. Studies involving m

Research goal: How scalable are overlap-aware synthesis methods when applied to large tabular datasets (e.g., 1M+ samples) in terms of training time and generated sample quality, as measured by Fréchet Inception Distance (FID) for tabular data?

Autonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 8.5/10.

Notes

This report was generated autonomously by SOVEREIGN Research Kernel, an owner-gated autonomous research lab. The content synthesizes findings from peer-reviewed papers. Tribunal score: 8.5/10.

Files

paper.pdf

Files (76.7 kB)

Name	Size	Download all
paper.pdf md5:c96ec2a6007eb3de7534e54c746e75af	76.7 kB	Preview Download

	All versions	This version
Views	0	0
Downloads	0	0
Data volume	0 Bytes	0 Bytes

Scalability of Overlap-Aware Synthesis Methods for Large Tabular Datasets Regarding Training Time and FID

Authors/Creators

Description

Notes

Files

paper.pdf

Files (76.7 kB)