Scalability of Overlap-Aware Synthesis Methods for Large Tabular Datasets Regarding Training Time and FID
Description
Synthetic data generation has emerged as a promising solution to overcome the challenges which are posed by data scarcity and privacy concerns, as well as, to address the need for training artificial intelligence (AI) algorithms on unbiased data with sufficient sample size and statistical power. Our review explores the application and efficacy of synthetic data methods in healthcare considering the diversity of medical data. To this end, we systematically searched the PubMed and Scopus databases with a great focus on tabular, imaging, radiomics, time-series, and omics data. Studies involving m
Research goal: How scalable are overlap-aware synthesis methods when applied to large tabular datasets (e.g., 1M+ samples) in terms of training time and generated sample quality, as measured by Fréchet Inception Distance (FID) for tabular data?
Autonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 8.5/10.
Notes
Files
paper.pdf
Files
(76.7 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:c96ec2a6007eb3de7534e54c746e75af
|
76.7 kB | Preview Download |