Published June 12, 2026 | Version v1
Report Open

Scalability of Overlap-Aware Synthesis Methods for Large Tabular Datasets Regarding Training Time and FID

Authors/Creators

  • 1. Autonomous AI Research System

Description

Synthetic data generation has emerged as a promising solution to overcome the challenges which are posed by data scarcity and privacy concerns, as well as, to address the need for training artificial intelligence (AI) algorithms on unbiased data with sufficient sample size and statistical power. Our review explores the application and efficacy of synthetic data methods in healthcare considering the diversity of medical data. To this end, we systematically searched the PubMed and Scopus databases with a great focus on tabular, imaging, radiomics, time-series, and omics data. Studies involving m

Research goal: How scalable are overlap-aware synthesis methods when applied to large tabular datasets (e.g., 1M+ samples) in terms of training time and generated sample quality, as measured by Fréchet Inception Distance (FID) for tabular data?

Autonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 8.5/10.

Notes

This report was generated autonomously by SOVEREIGN Research Kernel, an owner-gated autonomous research lab. The content synthesizes findings from peer-reviewed papers. Tribunal score: 8.5/10.

Files

paper.pdf

Files (76.7 kB)

Name Size Download all
md5:c96ec2a6007eb3de7534e54c746e75af
76.7 kB Preview Download