Published December 19, 2025 | Version 1.0.0-alpha
Dataset Open

EVENFLOW Personalised Medicine use case: Clear Cell Renal Cell Carcinoma (ccRCC, aka KIRC) datasets collection

  • 1. ROR icon Barcelona Supercomputing Center
  • 1. ROR icon Barcelona Supercomputing Center
  • 2. ROR icon Institució Catalana de Recerca i Estudis Avançats

Description

These datasets contain the synthetic data generated with a VAE from the TCGA-KIRC dataset. The file Static_KIRC.csv contains a pre-processed version of the bulk-RNASeq dataset from TCGA-KIRC. nodes_metadata.csv contains the clinical information of the patients (with columns: sample name, gender, race, and stage). A synthetic dataset is available (compressed), where 50 timepoints were inferred: trajectories_forward_test.csv.tar.gz simulates the progression of RNASeq data between patients at stages early and late. Both the static and trajectory datasets where analyzed for their biological interpretation with a Differential Expression (DESeq) analysis followed by GSEA on all the pathways available in the Reactome database. The results of the static analysis are included in static_gsea_reports_kirc.csv and the results on the trajectories may be found in trajectories_gsea_reports_kirc.csv.tar.gz.

Trained models used for the synthetic data generation are available in Hugging Face.

Files

nodes_metadata.csv

Files (1.4 GB)

Name Size Download all
md5:645039f1c12d67832743f742f74976f2
36.8 kB Preview Download
md5:8e229e18ba1383906a87a968eda2fd15
104.5 MB Preview Download
md5:66b3d6d757f7f0f7b14654940630a2e0
22.8 MB Preview Download
md5:22d473df4804f44478665e0c375bd770
490.0 MB Download
md5:5c23d97ad37598fa2f5ae348c5952c9b
736.3 MB Download

Additional details

Funding

European Commission
EVENFLOW - Robust Learning and Reasoning for Complex Event Forecasting 101070430

Dates

Created
2025-12-19

Software

Repository URL
https://github.com/gprolcastelo/renalprog
Programming language
Python , R
Development Status
Active