Published August 12, 2020 | Version v2.5
Dataset Open

GREEN-DB: Genomic Regulatory Elements ENcyclopedia

  • 1. Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK

Description

GREEN-DB is a comprehensive collection of 2.4 million regulatory elements in the human genome collected from previously published databases, high-throughput screenings and functional studies. Regulatory regions are classified as enhancers, promoters, silencers, bivalent and information on the controlled gene(s), tissue(s) and associated phenotype(s) are provided for each element when possible. We also calculated a variation constraint metric (range 0-1) for these regulatory regions and showed that genes controlled by constrained regions are enriched for disease-associated genes and essential genes from mouse knock-out screenings.

The database also includes information from ENCODE TFBS and DNase peaks; ultra-conserved non-coding elements (UCNE), super-enhancers (dbSuper) and TAD domains (TAD-KB).

This release includes 5 files:

  • GREEN-DB_v2.5.db.gz: The full database in SQLite format
  • GRCh37_GREEN-DB.bed.gz[.csi]: A indexed BED file using GRCh37 genome coordinates describing the regulatory regions and associated information useful for variant annotations (controlled genes, closest gene/TSS, constraint metric).
  • GRCh38_GREEN-DB.bed.gz[.csi]: A indexed BED file using GRCh38 genome coordinates describing the regulatory regions and associated information useful for variant annotations (controlled genes, closest gene/TSS, constraint metric).

To annotate a VCF file with information from GREEN-DB you can use the bed files and our tool GREEN-VARAN (https://github.com/edg1983/GREEN-VARAN).

For more information on the GREEN-DB please refer to our publication (https://doi.org/10.1101/2020.09.17.301960) and to online documentation (https://green-varan.readthedocs.io/en/latest/)

GREEN-DB is free to use for academic users, please refer to the attached LICENSE file.

 

Changes from the previous version:

- We fixed an issue with alias symbols conversion that caused a small fraction of region-gene links to point to the wrong gene

- Due to the problem above, we removed any region-gene link where the region and the controlled gene were located on different chromosomes

- GREEN-DB now includes also TAD domain information from TAD-KB (http://dna.cs.miami.edu/TADKB/) and region-gene interactions are now annotated for occurrence within the same TAD

- Better constraint metric model that now takes into account overlap with exonic regions

- In addition to the closest gene, an annotation for the closest TSS and its distance is now provided 

Files

LICENSE.pdf

Files (9.1 GB)

Name Size Download all
md5:0bad211916b05a6880499045c159d6cb
85.6 MB Download
md5:ea268a589e3dd2c327d809a5683f38c9
109.0 kB Download
md5:b773b1eb1ef6e03ccdea70dcf736a17f
85.5 MB Download
md5:d8e74537fa48f454a0325ad5fc4fc8ac
108.8 kB Download
md5:17f166d7ed4ada18ed8fa25410cc7cdd
8.9 GB Download
md5:9e3ec86b01cca0759f05da79676ce931
69.4 kB Preview Download

Additional details

Related works

Is documented by
Preprint: 10.1101/2020.09.17.301960 (DOI)