FANTOM5 transcribed enhancers in hg38
Creators
- 1. The Bioinformatics Centre, Department of Biology, University of Copenhagen, Copenhagen, Denmark
Description
Overview
Transcribed enhancers were identified and their expression was quantified across all human FANTOM5 libraries, following the re-aligned FANTOM5 CAGE data upon hg38 (GRCh38) (obtained from http://fantom.gsc.riken.jp/5/datafiles/reprocessed/hg38_v1/basic/), and decomposition-based peak identification (obtained from https://zenodo.org/record/545682#.WPuNy1Pyv2Q) by Kawaji, Hideya.
Description
Transcribed enhancers were called based on bidirectional balanced RNA signatures as per Andersson et al (2014). Enhancers were only identified distal to known exons (+/-100bp region from boundaries) and transcription start sites (+/-300bp), defined by GENCODE v24 annotation. In total, 63,285 enhancers were identified across 1,829 libraries. The expression was quantified and TPM (tags per million) normalised according to the total number of mapped reads within the full set of TCs. For details regarding the identification of transcribed enhancers from CAGE data, please see Andersson et al (2014) and blog post.
Due to varying noise levels across FANTOM5 libraries and the intrinsic low expression levels of transcribed enhancers, library-specific noise levels were estimated to define of robust set of enhancers in each sample. In summary, for each library, expression was quantified in randomly sampled genomic regions distal to assembly gaps, DNase hypersensitive sites (ENCODE), known exons and gene TSSs (GENCODE) to create a genomic background expression distribution. For each library, we called an enhancer active (used) if its expression was above the 99.9th quantile of the library’s genomic background expression distribution. The robust set of enhancers consist of 60,215 over 1,829 libraries, being significantly expressed in at least one library.
While this approach ensures less permissive enhancer calling in noisy libraries, for some libraries the noise threshold is zero meaning that a single CAGE tag is sufficient for calling an enhancer active. Furthermore, the possibility of detecting enhancer transcription is affected by sequencing depth, so the number of active enhancers per library might not be biologically meaningful to compare when sequencing depths differ.
Data files
Each predicted enhancer is described in BED12 format with two blocks denoting the merged regions of transcription initiation on the minus and plus strands. The thickStart and thickEnd columns denote the inferred mid position between blocks of transcription initiation events. Expression and usage matrices are tab delimited and the first row gives the FANTOM5 CNhs IDs and the first column the enhancer ID (same as column 4 in BED file). Usage matrices contain zeroes and ones (0:not used, 1:used).
- enhancers (BED12 format)
- enhancer expression matrix (tab delimited, first row: CNhs IDs, first column: enhancer ID (coordinate))
- enhancer expression matrix TPM normalized (tab delimited, first row: CNhs IDs, first column: enhancer ID (coordinate))
- binary enhancer usage matrix (0:not used, 1:used, tab delimited, first row: CNhs IDs, first column: enhancer ID (coordinate))
Files
Files
(110.3 MB)
Name | Size | Download all |
---|---|---|
md5:19939df010e17997477dd4b5f61683e4
|
1.7 MB | Download |
md5:f7aa970e19a572c75f9c5b2d8267b567
|
15.1 MB | Download |
md5:64577f832d14a784428878cff5b6a35e
|
89.7 MB | Download |
md5:7feff92546a359239c2545316fddf35e
|
3.8 MB | Download |