Published May 24, 2017 | Version v1
Dataset Open

Training data for de novo transcriptome reconstruction from RNA-seq data

  • 1. Johns Hopkins University
  • 2. University of Bradford


The data provided here are part of a Galaxy Training Network tutorial that analyzes RNA-seq data using a de novo transcriptome reconstruction strategy from a study published by Wu et al., 2014 (DOI:10.1101/gr.164830.113). The goal of this study was to investigate "the dynamics of occupancy and the role in gene regulation of the transcription factor Tal1, a critical regulator of hematopoiesis, at multiple stages of hematopoietic differentiation." To this end, RNA-seq libraries were constructed from multiple mouse cell types including G1E - a GATA-null immortalized cell line derived from targeted disruption of GATA-1 in mouse embryonic stem cells - and megakaryocytes. This RNA-seq data was used to determine differential gene expression between G1E and megakaryocytes and later correlated with Tal1 occupancy. This dataset (GEO Accession: GSE51338) consists of biological replicate, paired-end, polyA selected RNA-seq libraries. Because of the long processing time for the large original files, we have downsampled the original raw data files to include only reads that align to a subset of interesting genomic loci identified by Wu et al. This dataset represents an even smaller set of data than another training data set (DOI:10.5281/zenodo.254485).


Files (1.7 GB)

Name Size Download all
277.3 MB Download
277.3 MB Download
362.2 MB Download
362.2 MB Download
213.4 MB Download
213.4 MB Download
12.7 MB Download
12.7 MB Download
427.4 kB Download

Additional details


  • Afgan, E et al. The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update. 44, W3-W10 (2016).
  • Wu, W et al. Dynamic shifts in occupancy by TAL1 are guided by GATA factors and drive large-scale reprogramming of gene expression during hematopoiesis. 24, 1945-1962 (2014).