Published January 12, 2023 | Version 1.0.2
Dataset Open

Field-wide assessment of differential HT-seq from NCBI GEO database

  • 1. University of Tartu

Description

We analysed the field of expression profiling by high throughput sequencing, or HT-seq, in terms of replicability and reproducibility, using data from the NCBI GEO (Gene Expression Omnibus) repository. 

 

- This release includes GEO series published up to Dec-31, 2020;

 

geo-htseq.tar.gz  archive contains following files:

- output/parsed_suppfiles.csv, p-value histograms, histogram classes, estimated number of true null hypotheses (pi0).

- output/document_summaries.csv, document summaries of NCBI GEO series.

- output/suppfilenames.txt, list of all supplementary file names of NCBI GEO submissions.

- output/suppfilenames_filtered.txt, list of supplementary file names used for downloading files from NCBI GEO.

- output/publications.csv, publication info of NCBI GEO series.

- output/scopus_citedbycount.csv, Scopus citation info of NCBI GEO series

- output/spots.csv, NCBI SRA sequencing run metadata.

- output/cancer.csv, cancer related experiment accessions.

- output/transcription_factor.csv, TF related experiment accessions.

- output/single-cell.csv, single cell experiment accessions.

- blacklist.txt, list of supplementary files that were either too large to import or were causing computing environment crash during import.

 

Workflow to produce this dataset is available on Github at rstats-tartu/geo-htseq.

 

geo-htseq-updates.tar.gz archive contains files:

- results/detools_from_pmc.csv, differential expression analysis programs inferred from published articles

- results/n_data.csv, manually curated sample size info for NCBI GEO HT-seq series

- results/simres_df_parsed.csv, pi0 values estimated from differential expression results obtained from simulated RNA-seq data

- results/data/parsed_suppfiles_rerun.csv, pi0 values estimated using smoother method from anti-conservative p-value sets

 

 

Files

Files (70.1 MB)

Name Size Download all
md5:dc183824492b70dbffb4421c99a598cc
3.4 MB Download
md5:b8fbc7d67170ad512c46f0ee805d987a
66.7 MB Download