Published March 29, 2024 | Version v4
Dataset Open

GEO gene expression dataset recompute for selected tumor samples

  • 1. ROR icon University of Turin

Contributors

  • 1. ROR icon University of Turin

Description

We aligned and quantified RNA-Seq data present in GEO with a standardized pipeline to homogenize data preprocessing for downstream applications.

All uploaded files are UTF-8, `.csv`-formatted matrices. The `*_expected_count.csv.gz` files are unlogged, raw expression counts as reported by `rsem-quantify-expression` (see details below). The associated `*_metadata.csv.gz` files contain metadata pertinent to each column of the corresponding expression matrix.
Some metadata files may have more rows than the associated number of columns. This is for series that were only partially RNA-Seq based (e.g. combinated RNA-Seq plus miRNA-Seq samples in the same GEO accession ID).

Metadata columns are derived from GEO series files, and follow their definitions. See each GEO entry directly to determine metadata meaning.

Each recompute has at least the `gene_id` column holding Ensembl Gene IDs. The remaining columns are ENA run accession IDs of the specific recomputed samples.
Each associated metadata has at least the following columns:
- `geo_accession`: The GEO sample ID of the sample.
- `sample_accession`: The ENA sample ID of the sample.
- `run_accession`: The ENA run accession ID of the sample, to be cross-referenced with the expression matrices.

## Pipeline Details

The alignment and quantification was made with the `x.FASTQ` tool available [on Github](https://github.com/TCP-Lab/x.FASTQ) installed locally on an Arch Linux machine running the Linux `6.7.8-zen1-1-zen` kernel with a `11th Gen Intel i7-1185G7 (8)` CPU and a `Intel TigerLake-LP GT2 [Iris Xe Graphics]` GPU.

 

 

Files

Files (9.1 MB)

Name Size Download all
md5:7a7e362b6874baae7dfdcec425614a61
270.0 kB Download
md5:61a48d9699d05c330c1aadf4fbeada0d
1.8 kB Download
md5:962b514222f894665c5aa00a23aa0b34
2.3 MB Download
md5:6f97560e55b6add6cbc8487e098817ce
4.2 kB Download
md5:f668552471a99fcb563023adac4bc24e
1.7 MB Download
md5:bbda66b06c198df1f6f336fa18b4f197
2.4 kB Download
md5:bb6edde4350c8e9005610a074c0e7aab
414.3 kB Download
md5:65d6762bc4531660488e9044ab47054d
1.2 kB Download
md5:6a4bf3c125f40b5b3af7b26272df5fe6
4.3 MB Download
md5:caa6126f2c5fa104135bde24a712f7c1
3.3 kB Download

Additional details

Related works

Is derived from
Dataset: GSE22260 (Other)
Dataset: GSE29580 (Other)
Dataset: GSE121842 (Other)
Dataset: GSE159857 (Other)
Dataset: GSE60052 (Other)