Published November 27, 2018 | Version v1
Poster Open

Digital Expression Explorer 2: a repository of 4.5 trillion uniformly processed RNA-seq reads and counting

  • 1. Deakin University, Geelong, Australia, School of Life and Environmental Sciences
  • 2. Epigenetics in Human Health and Disease Laboratory, Department of Diabetes, Monash University, Melbourne, Australia, The Alfred Medical Research and Education Precinct, Melbourne, Vic, Australia
  • 3. Epigenetics in Human Health and Disease Laboratory, Department of Diabetes, Monash University, Melbourne, Australia, The Alfred Medical Research and Education Precinct, Melbourne, Vic, Australia. Hong Kong Institute of Diabetes and Obesity, Prince of Wales Hospital, The Chinese University of Hong Kong, Hong Kong SAR

Description

Background: Transcriptome profiling by RNA-seq has enhanced scientific understanding of gene regulation. Despite the benefits these data have brought in terms of transcriptome coverage and accuracy, there are considerable barriers-to-entry for the novice computational biologist to analyse these large data sets. There is a definite need for a repository of uniformly processed RNA-seq data that is easy to use and represents major model organisms. Findings: To address these obstacles, we developed Digital Expression Explorer 2 (DEE2), a web-based repository of RNA-seq data in the form of gene-level and transcript-level expression counts. DEE2 contains over 400,000 RNA-seq data sets from several species including yeast, Arabidopsis, worm, fruit fly, zebrafish, rat, mouse and human. Base-space sequence data downloaded from NCBI Sequence Read Archive underwent quality analysis, filtering and trimming prior to transcriptome and genome alignment and read counting using open-source tools. Uniform reference-genome and data processing methods ensure consistency across experiments, facilitating fast and reproducible meta-analyses. Conclusions: The web interface enables users to quickly identify data sets of interest through accession number and keyword searches. These data can also be accessed programmatically using a specifically designed R script. We demonstrate how DEE2 data is compatible with statistical packages such as edgeR or DESeq. DEE2 can be found at http://dee2.io

Files

Ziemann_ABACBS_2019_v3.pdf

Files (87.1 kB)

Name Size Download all
md5:8012ae7aaa8a1ba476eaeec2418c0890
87.1 kB Preview Download

Additional details

References

  • Barrett et al, 2013. DOI: 10.1093/nar/gks1193
  • Dobin et al, 2013. DOI: 10.1093/bioinformatics/bts635
  • Bray et al, 2016. DOI: 10.1038/nbt.3519
  • Lachmann et al, 2018. DOI: 10.1038/s41467-018-03751-6
  • Collado-Torres et al, 2017. DOI: 10.1038/nbt.3838