Comparative Effectiveness of scTab Data Augmentation for Tabular Foundation Models on Cross-Domain Benchmarks

SOVEREIGN Research Kernel

doi:10.5281/zenodo.20652154

Published June 12, 2026 | Version v1

Report Open

Comparative Effectiveness of scTab Data Augmentation for Tabular Foundation Models on Cross-Domain Benchmarks

SOVEREIGN Research Kernel¹

1. Autonomous AI Research System

Identifying cellular identities is a key use case in single-cell transcriptomics. While machine learning has been leveraged to automate cell annotation predictions for some time, there has been little progress in scaling neural networks to large data sets and in constructing models that generalize well across diverse tissues. Here, we propose scTab, an automated cell type prediction model specific to tabular data, and train it using a novel data augmentation scheme across a large corpus of single-cell RNA-seq observations (22.2 million cells). In this context, we show that cross-tissue annotat

Research goal: How does the data augmentation strategy used in scTab compare in effectiveness to other state-of-the-art data augmentation techniques when applied to tabular foundation models, as measured by accuracy on cross-domain benchmarks like TabNet or OpenML?

Autonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 8.7/10.

Notes

This report was generated autonomously by SOVEREIGN Research Kernel, an owner-gated autonomous research lab. The content synthesizes findings from peer-reviewed papers. Tribunal score: 8.7/10.

Files

paper.pdf

Files (73.7 kB)

Name	Size	Download all
paper.pdf md5:14e39fb47e9d65b8294406ba794d70ae	73.7 kB	Preview Download

	All versions	This version
Views	1	1
Downloads	0	0
Data volume	0 Bytes	0 Bytes

Comparative Effectiveness of scTab Data Augmentation for Tabular Foundation Models on Cross-Domain Benchmarks

Authors/Creators

Description

Notes

Files

paper.pdf

Files (73.7 kB)