GTEx V11 Tissue Classification Dataset: Image-Based Representation of Gene Expression (TPM) for Selected Human Tissues
Description
This dataset provides an image-based representation of gene expression profiles derived from the GTEx (Genotype-Tissue Expression) Project, version 11. It was generated from publicly available RNA-seq data, specifically the gene-level TPM expression matrix and corresponding sample annotations. A subset of samples was selected from seven human tissues (Brain - Cortex, Heart - Left Ventricle, Liver, Lung, Muscle - Skeletal, Adipose - Subcutaneous, and Skin - Sun Exposed), with up to 200 samples per tissue retained when available. Only samples present in both the expression matrix and annotation file were included, resulting in a final dataset of 587 samples.
For each sample, gene expression values (~74,000 genes) were log-transformed (log1p) and converted into a grayscale image by reshaping the expression vector into a square matrix with zero-padding. Each image encodes the full transcriptome of a sample and is suitable for image-based machine learning workflows. The dataset includes a compressed archive of images (selected_gtex_v11_tpm_image_tissue_dataset.zip) and a corresponding label file (selected_gtex_v11_tpm_image_tissue_labels.csv) mapping each image to its tissue type.
This dataset is intended for research and benchmarking in machine learning, particularly for exploring image-based representations of high-dimensional transcriptomic data and tissue classification tasks.
Files
selected_gtex_v11_tpm_image_tissue_dataset.zip
Files
(15.8 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:4f5b0dfe4e760ce4feeaa3ee6cf533ed
|
15.8 MB | Preview Download |
|
md5:4920f09c3810a3fedb699285397a95a9
|
27.0 kB | Preview Download |