GREEN-DB: Genomic Regulatory Elements ENcyclopedia
- 1. Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK
Description
GREEN-DB is a comprehensive collection of 2.4 million regulatory elements in the human genome collected from previously published databases, high-throughput screenings and functional studies. Regulatory regions are classified as enhancers, promoters, silencers, bivalent and information on the controlled gene(s), tissue(s) and associated phenotype(s) are provided for each element when possible. We also calculated a variation constraint metric (range 0-1) for these regulatory regions and showed that genes controlled by constrained regions are enriched for disease-associated genes and essential genes from mouse knock-out screenings.
The database also includes information from ENCODE TFBS and DNase peaks; ultra-conserved non-coding elements (UCNE), super-enhancers (dbSuper) and TAD domains (TAD-KB).
This release includes 5 files:
- GREEN-DB_v2.5.db.gz: The full database in SQLite format
- GRCh37_GREEN-DB.bed.gz[.csi]: A indexed BED file using GRCh37 genome coordinates describing the regulatory regions and associated information useful for variant annotations (controlled genes, closest gene/TSS, constraint metric).
- GRCh38_GREEN-DB.bed.gz[.csi]: A indexed BED file using GRCh38 genome coordinates describing the regulatory regions and associated information useful for variant annotations (controlled genes, closest gene/TSS, constraint metric).
To annotate a VCF file with information from GREEN-DB you can use the bed files and our tool GREEN-VARAN (https://github.com/edg1983/GREEN-VARAN).
For more information on the GREEN-DB please refer to our publication (https://doi.org/10.1101/2020.09.17.301960) and to online documentation (https://green-varan.readthedocs.io/en/latest/)
GREEN-DB is free to use for academic users, please refer to the attached LICENSE file.
Changes from the previous version:
- We fixed an issue with alias symbols conversion that caused a small fraction of region-gene links to point to the wrong gene
- Due to the problem above, we removed any region-gene link where the region and the controlled gene were located on different chromosomes
- GREEN-DB now includes also TAD domain information from TAD-KB (http://dna.cs.miami.edu/TADKB/) and region-gene interactions are now annotated for occurrence within the same TAD
- Better constraint metric model that now takes into account overlap with exonic regions
- In addition to the closest gene, an annotation for the closest TSS and its distance is now provided
Files
LICENSE.pdf
Files
(9.1 GB)
Name | Size | Download all |
---|---|---|
md5:0bad211916b05a6880499045c159d6cb
|
85.6 MB | Download |
md5:ea268a589e3dd2c327d809a5683f38c9
|
109.0 kB | Download |
md5:b773b1eb1ef6e03ccdea70dcf736a17f
|
85.5 MB | Download |
md5:d8e74537fa48f454a0325ad5fc4fc8ac
|
108.8 kB | Download |
md5:17f166d7ed4ada18ed8fa25410cc7cdd
|
8.9 GB | Download |
md5:9e3ec86b01cca0759f05da79676ce931
|
69.4 kB | Preview Download |
Additional details
Related works
- Is documented by
- Preprint: 10.1101/2020.09.17.301960 (DOI)