Published April 14, 2020 | Version 1
Journal article Open

EpiRegio: Analysis and retrieval of regulatory elements linked to genes

  • 1. Institute for Cardiovascular Regeneration, Goethe University Hospital, 60590 Frankfurt am Main, Germany
  • 2. Institute for Cardiovascular Regeneration, Goethe University Hospital, 60590 Frankfurt am Main, Germany;
  • 3. Genome Institute of Singapore, 60 Biopolis Street, Genome, 02-01 Singapore 138672; Cluster of Excellence, Multimodal Computing and Interaction, Saarland Informatics Campus, 66123 Saarbrücken, Germany,
  • 4. Big Data in BioMedicine Group, Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Maximus-von-Imhof-Forum 3, 85354 Freising, Germany

Description

The data set contains all regulatory elements (REMs) and the additional information used to create the EpiRegio  webserver (https://epiregio.de). 

The data set consists of 10 tables (CSV-files):

  • GenomeAnnotation: contains information about genomeVersion, annotationVersion and databaseName (GenomeAnnotation_1.csv.gz)
  • GeneAnnotation: Information of the genes (chr, start, end, geneID, geneSymbol, alternativeGeneID, isTF, strand and annotationVersion) (GeneAnnotation_1.csv.gz)
  • GeneExpression of Blueprint and Roadmap: Per consortium one table containing information about geneID, sampleID, expressionLog2TPM and species (GeneExpression_Blueprint_1.csv.gz and GeneExpressionRoadmap_1.csv.gz)
  • CellTypeInfo: Information of the used cell and tissue types  (cellTypeID, cellTypeName and cellOntologyTerm) (CellTypeInfo.csv.gz)
  • sampleInfo of Roadmap and Blueprint: Per consortium one table containing  information about sampleID, originalSampleID, cellTypeID, origin and dataType (sampleInfo_Blueprint_1.csv.gz and sampleInfo_Roadmap_1.csv.gz)
  • REMAnnotation: contains all predicted REMs using STITCHIT (chr, start, end, geneID, REMID, regressionCoefficient, pValue, normModelScore, meanDNase1Signal, sdDNase1Signal, consortium and version) (REMAnnotationModelScore_1.csv.gz)
  • REMActivity:  This table contains per REM the DNase-signal  and the standardised DNase-signal per cell or tissue type (REMID, sampleID, dnase1Log2, standDnase1Log2 and version) (REMActivity_1.csv.gz)
  • clusterREMs: contains all CREMs (REMID, CREMID, chr, start, end, REMsPerCREM and version) (clusterREMs_1.csv.gz)

With these tables the underlying database of EpiRegio can easily be reconstructed. The source code for the current version of the EpiRegio webserver version is available at 10.5281/zenodo.3751189. EpiRegio uses the STITCHIT algorithm, which is currently under revision.  The preprint is available at http://dx.doi.org/10.1101/585125. 

 

Notes

This work has been supported by the DZHK (German Centre for Cardiovascular Research, 81Z0200101) and the Cardio-Pulmonary Institute (CPI) [EXC 2026], and the DFG SFB/TRR 267 Noncoding RNAs in the cardiovascular system.

Files

Files (7.1 GB)

Name Size Download all
md5:893482b8e2ec02346408f9d0ede2aa42
1.0 kB Download
md5:945f87f088b5e0dd8566a7fd23e5c9c3
7.4 MB Download
md5:4fb2808ccf9b8f31bd8c29c7cfbf0a7f
1.0 MB Download
md5:2b43a6632029c7b289f260d6950b2874
21.0 MB Download
md5:77753f17fcb15518046ac296dd84cf07
43.1 MB Download
md5:83316853916b1df35f35d931d2d0c53f
97 Bytes Download
md5:d4668f53c78a657d26379035f96d9ef7
6.9 GB Download
md5:058bcffb802579e8dbc7296b504d351f
130.3 MB Download
md5:f5be3ba36d69cf79abb438c4305ce310
557 Bytes Download
md5:f1412276ae4f5dc343b0f8930366053b
1.1 kB Download

Additional details

Related works

Compiles
Software documentation: 10.5281/zenodo.3751189 (DOI)
Is derived from
Journal article: 10.5281/zenodo.3665990 (DOI)