Aberrant gene expression prediction benchmark based on GTEx v8
Description
This repository contains the aberrant gene expression prediction benchmark data as well as the necessary expected gene expression across tissues and tissue-specific isoform contribution scores for AbExp prediction.
The aberrant gene expression prediction benchmark data (aberrant_expression_prediction_benchmark.parquet) contains the following columns:
- individual: GTEx individual
- gene: Ensembl gene identifier
- tissue: GTEx tissue
- tissue_type: GTEx tissue type
- mu: OUTRIDER-estimated expected gene expression
- theta: OUTRIDER-estimated gene dispersion
- counts: Raw gene expression count
- normalized_counts: OUTRIDER-normalized gene expression count
- l2fc: log2 fold change between observed and expected gene expression count
- zscore: z-score of gene expression, obtained by quantile-mapping the OUTRIDER-estimated distribution to the standard normal distribution
- nominal_pvalue: OUTRIDER-estimated p-value of being an expression outlier
- FDR: FDR-adjusted p-value of being an expression outlier
- is_in_benchmark: Whether this observation is part of the aberrant gene expression prediction benchmark
- is_underexpressed_outlier: Whether this observation is an underexpression outlier at FDR < 20%. This is the benchmark prediction label.
The isoform proportions table (gtex_v8_isoform_proportions.tsv) contains the following columns:
- gene: Ensembl gene identifier
- tissue_type: GTEx tissue type
- tissue: GTEx tissue
- transcript: Ensembl transcript identifier
- mean_transcript_proportions: mean transcript proportions across individuals in GTEx v8
- median_transcript_proportions: median transcript proportions across individuals in GTEx v8
- sd_transcript_proportions: standard deviation of transcript proportions across individuals in GTEx v8
The expected gene expression table (gtex_v8_expected_expression.tsv) contains the following columns:
- gene: Ensembl gene identifier
- tissue_type: GTEx tissue type
- tissue: GTEx tissue
- gene_is_expressed: Whether the gene is expressed in the tissue
- median_expression: median OUTRIDER-estimated expected gene expression (mu) across individuals
- expression_dispersion: OUTRIDER-estimated gene dispersion (theta)
Files
Files
(5.3 GB)
Name | Size | Download all |
---|---|---|
md5:31e34226ef8a1ab058bdcfb7525d693f
|
4.2 GB | Download |
md5:84afc5e9d5922776fc7b4e541445711f
|
100.3 MB | Download |
md5:22285ef968fd38e1e177e4031cf6d827
|
932.8 MB | Download |