Journal article Open Access

EpiRegio: Analysis and retrieval of regulatory elements linked to genes

Baumgarten Nina; Hecker Dennis; Karunanithi Sivarajan; Schmidt Florian; List Markus; Schulz Marcel H.

The data set contains all regulatory elements (REMs) and the additional information used to create the EpiRegio  webserver (https://epiregio.de). 

The data set consists of 10 tables (CSV-files):

  • GenomeAnnotation: contains information about genomeVersion, annotationVersion and databaseName (GenomeAnnotation_1.csv.gz)
  • GeneAnnotation: Information of the genes (chr, start, end, geneID, geneSymbol, alternativeGeneID, isTF, strand and annotationVersion) (GeneAnnotation_1.csv.gz)
  • GeneExpression of Blueprint and Roadmap: Per consortium one table containing information about geneID, sampleID, expressionLog2TPM and species (GeneExpression_Blueprint_1.csv.gz and GeneExpressionRoadmap_1.csv.gz)
  • CellTypeInfo: Information of the used cell and tissue types  (cellTypeID, cellTypeName and cellOntologyTerm) (CellTypeInfo.csv.gz)
  • sampleInfo of Roadmap and Blueprint: Per consortium one table containing  information about sampleID, originalSampleID, cellTypeID, origin and dataType (sampleInfo_Blueprint_1.csv.gz and sampleInfo_Roadmap_1.csv.gz)
  • REMAnnotation: contains all predicted REMs using STITCHIT (chr, start, end, geneID, REMID, regressionCoefficient, pValue, normModelScore, meanDNase1Signal, sdDNase1Signal, consortium and version) (REMAnnotationModelScore_1.csv.gz)
  • REMActivity:  This table contains per REM the DNase-signal  and the standardised DNase-signal per cell or tissue type (REMID, sampleID, dnase1Log2, standDnase1Log2 and version) (REMActivity_1.csv.gz)
  • clusterREMs: contains all CREMs (REMID, CREMID, chr, start, end, REMsPerCREM and version) (clusterREMs_1.csv.gz)

With these tables the underlying database of EpiRegio can easily be reconstructed. The source code for the current version of the EpiRegio webserver version is available at 10.5281/zenodo.3751189. EpiRegio uses the STITCHIT algorithm, which is currently under revision.  The preprint is available at http://dx.doi.org/10.1101/585125. 

 

This work has been supported by the DZHK (German Centre for Cardiovascular Research, 81Z0200101) and the Cardio-Pulmonary Institute (CPI) [EXC 2026], and the DFG SFB/TRR 267 Noncoding RNAs in the cardiovascular system.
Files (7.1 GB)
Name Size
CellTypeInfo.csv.gz
md5:893482b8e2ec02346408f9d0ede2aa42
1.0 kB Download
clusterREMs_1.csv.gz
md5:945f87f088b5e0dd8566a7fd23e5c9c3
7.4 MB Download
GeneAnnotation_1.csv.gz
md5:4fb2808ccf9b8f31bd8c29c7cfbf0a7f
1.0 MB Download
GeneExpressionBlueprint_1.csv.gz
md5:2b43a6632029c7b289f260d6950b2874
21.0 MB Download
GeneExpressionRoadmap_1.csv.gz
md5:77753f17fcb15518046ac296dd84cf07
43.1 MB Download
GenomeAnnotation_1.csv.gz
md5:83316853916b1df35f35d931d2d0c53f
97 Bytes Download
REMActivity_1.csv.gz
md5:d4668f53c78a657d26379035f96d9ef7
6.9 GB Download
REMAnnotationModelScore_1.csv.gz
md5:058bcffb802579e8dbc7296b504d351f
130.3 MB Download
sampleInfo_Blueprint_1.csv.gz
md5:f5be3ba36d69cf79abb438c4305ce310
557 Bytes Download
sampleInfo_Roadmap_1.csv.gz
md5:f1412276ae4f5dc343b0f8930366053b
1.1 kB Download
114
53
views
downloads
All versions This version
Views 11467
Downloads 5339
Data volume 22.9 GB22.7 GB
Unique views 9159
Unique downloads 2017

Share

Cite as