Published April 14, 2020 | Version 1
Journal article Open

EpiRegio: Analysis and retrieval of regulatory elements linked to genes

  • 1. Institute for Cardiovascular Regeneration, Goethe University Hospital, 60590 Frankfurt am Main, Germany
  • 2. Institute for Cardiovascular Regeneration, Goethe University Hospital, 60590 Frankfurt am Main, Germany;
  • 3. Genome Institute of Singapore, 60 Biopolis Street, Genome, 02-01 Singapore 138672; Cluster of Excellence, Multimodal Computing and Interaction, Saarland Informatics Campus, 66123 Saarbrücken, Germany,
  • 4. Big Data in BioMedicine Group, Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Maximus-von-Imhof-Forum 3, 85354 Freising, Germany


The data set contains all regulatory elements (REMs) and the additional information used to create the EpiRegio  webserver ( 

The data set consists of 10 tables (CSV-files):

  • GenomeAnnotation: contains information about genomeVersion, annotationVersion and databaseName (GenomeAnnotation_1.csv.gz)
  • GeneAnnotation: Information of the genes (chr, start, end, geneID, geneSymbol, alternativeGeneID, isTF, strand and annotationVersion) (GeneAnnotation_1.csv.gz)
  • GeneExpression of Blueprint and Roadmap: Per consortium one table containing information about geneID, sampleID, expressionLog2TPM and species (GeneExpression_Blueprint_1.csv.gz and GeneExpressionRoadmap_1.csv.gz)
  • CellTypeInfo: Information of the used cell and tissue types  (cellTypeID, cellTypeName and cellOntologyTerm) (CellTypeInfo.csv.gz)
  • sampleInfo of Roadmap and Blueprint: Per consortium one table containing  information about sampleID, originalSampleID, cellTypeID, origin and dataType (sampleInfo_Blueprint_1.csv.gz and sampleInfo_Roadmap_1.csv.gz)
  • REMAnnotation: contains all predicted REMs using STITCHIT (chr, start, end, geneID, REMID, regressionCoefficient, pValue, normModelScore, meanDNase1Signal, sdDNase1Signal, consortium and version) (REMAnnotationModelScore_1.csv.gz)
  • REMActivity:  This table contains per REM the DNase-signal  and the standardised DNase-signal per cell or tissue type (REMID, sampleID, dnase1Log2, standDnase1Log2 and version) (REMActivity_1.csv.gz)
  • clusterREMs: contains all CREMs (REMID, CREMID, chr, start, end, REMsPerCREM and version) (clusterREMs_1.csv.gz)

With these tables the underlying database of EpiRegio can easily be reconstructed. The source code for the current version of the EpiRegio webserver version is available at 10.5281/zenodo.3751189. EpiRegio uses the STITCHIT algorithm, which is currently under revision.  The preprint is available at 



This work has been supported by the DZHK (German Centre for Cardiovascular Research, 81Z0200101) and the Cardio-Pulmonary Institute (CPI) [EXC 2026], and the DFG SFB/TRR 267 Noncoding RNAs in the cardiovascular system.


Files (7.1 GB)

Name Size Download all
1.0 kB Download
7.4 MB Download
1.0 MB Download
21.0 MB Download
43.1 MB Download
97 Bytes Download
6.9 GB Download
130.3 MB Download
557 Bytes Download
1.1 kB Download

Additional details

Related works

Software documentation: 10.5281/zenodo.3751189 (DOI)
Is derived from
Journal article: 10.5281/zenodo.3665990 (DOI)