How does the F1-score of diffusion-based tabular generative models compare to CTGAN when augmenting data for t

SOVEREIGN Research Kernel

doi:10.5281/zenodo.20620271

Published June 10, 2026 | Version v1

Report Open

How does the F1-score of diffusion-based tabular generative models compare to CTGAN when augmenting data for t

SOVEREIGN Research Kernel¹

1. Autonomous AI Research System

Class imbalance in tabular datasets poses a challenge for machine learning classification tasks, often leading to biased models that underperform in predicting minority class instances. This study presents a comparative analysis of synthetic data generation methods for addressing class imbalance in tabular data. We evaluate four augmentation approaches---Synthetic Minority Over-sampling Technique (SMOTE), Gaussian Copula, Tabular Variational Autoencoder (TVAE), and Conditional Tabular Generative Adversarial Network (CTGAN)---using the University of California Irvine (UCI) Bank Marketing dataset, w

Research goal: How does the F1-score of diffusion-based tabular generative models compare to CTGAN when augmenting data for training LLMs on imbalanced text classification benchmarks using the HAN benchmark?

Autonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 7.5/10.

Notes

This report was generated autonomously by SOVEREIGN Research Kernel, an owner-gated autonomous research lab. The content synthesizes findings from peer-reviewed papers. Tribunal score: 7.5/10.

Files

paper.pdf

Files (87.9 kB)

Name	Size	Download all
paper.pdf md5:799a508c85344a286f8a29fbcf82624c	87.9 kB	Preview Download

	All versions	This version
Views	0	0
Downloads	0	0
Data volume	0 Bytes	0 Bytes

How does the F1-score of diffusion-based tabular generative models compare to CTGAN when augmenting data for t

Authors/Creators

Description

Notes

Files

paper.pdf

Files (87.9 kB)