Signature Informed Sampling for Transcriptomic Data
- 1. ETH Zürich, IBM Research Europe
- 2. IBM Research Europe
Description
This repository contains the data and associated results of all experiments conducted in our work "Signature Informed Sampling for Transcriptomic Data". In this work we propose a simple, novel, non-parametric method for augmenting data inspired by the concept of chromosomal crossover. We benchmark our proposed methods against random oversampling, SMOTE, modified versions of gamma-Poisson and Poisson sapling, and the unbalanced data.
The compressed file data_5x5stratified.zip contains all the data used for our experiments. This includes the original count data based off of which augmentation was performed, the cross validation split indices as a json file, the training and validation data (TCGA) augmented by the various augmentation methods mentioned in our study, a test set (containing only real samples from TCGA) and an external test set (CPTAC) standardised accordingly with respect to each augmentation method and training data per cv split.
The compressed file 5x5_Results.zip contains all the results from all the experiments. This includes the parameter files used to train the various models, the metrics computed, the latent space of train, validation and test (if the model is a VAE), and the trained model itself for all 25 (5x5) splits.
Files
5x5_Results.zip
Files
(4.0 GB)
Name | Size | Download all |
---|---|---|
md5:879df861919e1ef3c6d9b87b4868088a
|
2.6 GB | Preview Download |
md5:9944aa24077cdc7ba392957669621896
|
1.5 GB | Preview Download |
Additional details
Funding
- Swiss National Science Foundation
- Trans-omic approach to colorectal cancer: an integrative computational and clinical perspective CRSII5_193832