What is the impact of varying the pretraining dataset size and diversity on the cross-domain generalization ca
Description
Identifying cellular identities is a key use case in single-cell transcriptomics. While machine learning has been leveraged to automate cell annotation predictions for some time, there has been little progress in scaling neural networks to large data sets and in constructing models that generalize well across diverse tissues. Here, we propose scTab, an automated cell type prediction model specific to tabular data, and train it using a novel data augmentation scheme across a large corpus of single-cell RNA-seq observations (22.2 million cells). In this context, we show that cross-tissue annotat
Research goal: What is the impact of varying the pretraining dataset size and diversity on the cross-domain generalization capabilities of tabular foundation models, as measured by accuracy on unseen domains in benchmarks like TabNet or OpenML?
Autonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 7.8/10.
Notes
Files
paper.pdf
Files
(72.2 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:5e32d769663a74481364bc77161a1722
|
72.2 kB | Preview Download |