There is a newer version of the record available.

Published July 4, 2022 | Version 1.0.1
Dataset Open

Field-wide assessment of differential HT-seq from NCBI GEO database

  • 1. University of Tartu

Description

We analysed the field of expression profiling by high throughput sequencing, or HT-seq, in terms of replicability and reproducibility, using data from the NCBI GEO (Gene Expression Omnibus) repository. 

 

- This release includes GEO series published up to Dec-31, 2020;

 

Archived dataset contains following files:

 

- output/parsed_suppfiles.csv, p-value histograms, histogram classes, estimated number of true null hypotheses (pi0).

- output/document_summaries.csv, document summaries of NCBI GEO series.

- output/suppfilenames.txt, list of all supplementary file names of NCBI GEO submissions.

- output/suppfilenames_filtered.txt, list of supplementary file names used for downloading files from NCBI GEO.

- output/publications.csv, publication info of NCBI GEO series.

- output/scopus_citedbycount.csv, Scopus citation info of NCBI GEO series

- output/spots.csv, NCBI SRA sequencing run metadata.

- output/cancer.csv, cancer related experiment accessions.

- output/transcription_factor.csv, TF related experiment accessions.

- output/single-cell.csv, single cell experiment accessions.

- blacklist.txt, list of supplementary files that were either too large to import or were causing computing environment crash during import.

 

Workflow to produce this dataset is available on Github at rstats-tartu/geo-htseq.

 

Files

Files (66.7 MB)

Name Size Download all
md5:b8fbc7d67170ad512c46f0ee805d987a
66.7 MB Download