Benchmark Multi-Omics Datasets for Methods Comparison

Odom, Gabriel; Wang, Lily

doi:10.5281/zenodo.5683002

Published November 11, 2021 | Version v1

Dataset Open

Benchmark Multi-Omics Datasets for Methods Comparison

1. Florida International University
2. University of Miami

Pathway Multi-Omics Simulated Data

These are synthetic variations of the TCGA COADREAD data set (original data available at http://linkedomics.org/data_download/TCGA-COADREAD/). This data set is used as a comprehensive benchmark data set to compare multi-omics tools in the manuscript "pathwayMultiomics: An R package for efficient integrative analysis of multi-omics datasets with matched or un-matched samples".

There are 100 sets (stored as 100 sub-folders, the first 50 in "pt1" and the second 50 in "pt2") of random modifications to centred and scaled copy number, gene expression, and proteomics data saved as compressed data files for the R programming language. These data sets are stored in subfolders labelled "sim001", "sim002", ..., "sim100". Each folder contains the following contents: 1) "indicatorMatricesXXX_ls.RDS" is a list of simple triplet matrices showing which genes (in which pathways) and which samples received the synthetic treatment (where XXX is the simulation run label: 001, 002, ...), (2) "CNV_partitionA_deltaB.RDS" is the synthetically modified copy number variation data (where A represents the proportion of genes in each gene set to receive the synthetic treatment [partition 1 is 20%, 2 is 40%, 3 is 60% and 4 is 80%] and B is the signal strength in units of standard deviations), (3) "RNAseq_partitionA_deltaB.RDS" is the synthetically modified gene expression data (same parameter legend as CNV), and (4) "Prot_partitionA_deltaB.RDS" is the synthetically modified protein expression data (same parameter legend as CNV).

Supplemental Files

The file "cluster_pathway_collection_20201117.gmt" is the collection of gene sets used for the simulation study in Gene Matrix Transpose format. Scripts to create and analyze these data sets available at: https://github.com/TransBioInfoLab/pathwayMultiomics_manuscript_supplement

Notes

Research funded by: NIA 1RF1AG061127-01 (Wang, PI)

Files

pathwayMultiomics_simulation_pt1_20201201.zip

Files (17.2 GB)

Name	Size	Download all
cluster_pathway_collection_20201117.gmt md5:560d76f0eda791ce9d4a31c155b8fd16	11.1 kB	Download
pathwayMultiomics_simulation_pt1_20201201.zip md5:5569d9f7b88e8a277362168eb1bbb05b	8.6 GB	Preview Download
pathwayMultiomics_simulation_pt2_20201201.zip md5:80f302713fcaaf32f7cca1958eca7337	8.6 GB	Preview Download

	All versions	This version
Views	592	587
Downloads	205	205
Data volume	1.2 TB	1.2 TB

Benchmark Multi-Omics Datasets for Methods Comparison

Creators

Description

Notes

Files

pathwayMultiomics_simulation_pt1_20201201.zip

Files (17.2 GB)