There is a newer version of the record available.

Published November 21, 2023 | Version 1
Dataset Open

Aberrant gene expression prediction benchmark based on GTEx v8

  • 1. Technical University Munich

Description

This repository contains the aberrant gene expression prediction benchmark data as well as the necessary expected gene expression across tissues and tissue-specific isoform contribution scores for AbExp prediction.
 

The aberrant gene expression prediction benchmark data (aberrant_expression_prediction_benchmark.parquet) contains the following columns:

  • individual: GTEx individual
  • gene: Ensembl gene identifier
  • tissue: GTEx tissue
  • tissue_type: GTEx tissue type
  • mu: OUTRIDER-estimated expected gene expression
  • theta: OUTRIDER-estimated gene dispersion
  • counts: Raw gene expression count
  • normalized_counts: OUTRIDER-normalized gene expression count
  • l2fc: log2 fold change between observed and expected gene expression count
  • zscore: z-score of gene expression, obtained by quantile-mapping the OUTRIDER-estimated distribution to the standard normal distribution
  • nominal_pvalue: OUTRIDER-estimated p-value of being an expression outlier
  • FDR: FDR-adjusted p-value of being an expression outlier
  • is_in_benchmark: Whether this observation is part of the aberrant gene expression prediction benchmark
  • is_underexpressed_outlier: Whether this observation is an underexpression outlier at FDR < 20%. This is the benchmark prediction label.


The isoform proportions table (gtex_v8_isoform_proportions.tsv) contains the following columns:

  • gene: Ensembl gene identifier
  • tissue_type: GTEx tissue type
  • tissue: GTEx tissue
  • transcript: Ensembl transcript identifier
  • mean_transcript_proportions: mean transcript proportions across individuals in GTEx v8
  • median_transcript_proportions: median transcript proportions across individuals in GTEx v8
  • sd_transcript_proportions: standard deviation of transcript proportions across individuals in GTEx v8


The expected gene expression table (gtex_v8_expected_expression.tsv) contains the following columns:

  • gene: Ensembl gene identifier
  • tissue_type: GTEx tissue type
  • tissue: GTEx tissue
  • gene_is_expressed: Whether the gene is expressed in the tissue
  • median_expression: median OUTRIDER-estimated expected gene expression (mu) across individuals
  • expression_dispersion: OUTRIDER-estimated gene dispersion (theta)

Files

Files (5.3 GB)

Name Size Download all
md5:31e34226ef8a1ab058bdcfb7525d693f
4.2 GB Download
md5:84afc5e9d5922776fc7b4e541445711f
100.3 MB Download
md5:22285ef968fd38e1e177e4031cf6d827
932.8 MB Download