Published September 7, 2018 | Version v1
Dataset Open

FANTOM5 transcribed enhancers in mm10

  • 1. The Bioinformatics Centre, Department of Biology, University of Copenhagen, Copenhagen, Denmark

Description

Overview

Transcribed enhancers were identified and their expression was quantified across all human FANTOM5 libraries, following the re-aligned FANTOM5 CAGE data upon mm10 (GRCm38) (obtained from http://fantom.gsc.riken.jp/5/datafiles/reprocessed/mm10_v1/basic/), and decomposition-based peak identification (obtained from https://zenodo.org/record/545682#.WPuNy1Pyv2Q) by Kawaji, Hideya.

Description

Transcribed enhancers were called based on bidirectional balanced RNA signatures as per Andersson et al (2014). Enhancers were only identified distal to known exons (+/-100bp region from boundaries) and transcription start sites (+/-300bp), defined by GENCODE vM7 annotation. In total, 44,138 enhancers were identified across 1,068 libraries and the expression was quantified. For details regarding the identification of transcribed enhancers from CAGE data, please see Andersson et al (2014) and blog post.

Due to varying noise levels across FANTOM5 libraries and the intrinsic low expression levels of transcribed enhancers, library-specific noise levels were estimated to define of robust set of enhancers in each sample. In summary, for each library, expression was quantified in randomly sampled genomic regions distal to assembly gaps, DNase hypersensitive sites (ENCODE), known exons and gene TSSs (GENCODE vM7) to create a genomic background expression distribution. For each library, we called an enhancer active (used) if its expression was above the 99.9th quantile of the library’s genomic background expression distribution. The robust set of enhancers consist of those significantly expressed in at least one library.

While this approach ensures less permissive enhancer calling in noisy libraries, for some libraries the noise threshold is zero meaning that a single CAGE tag is sufficient for calling an enhancer active. Furthermore, the possibility of detecting enhancer transcription is affected by sequencing depth, so the number of active enhancers per library might not be biologically meaningful to compare when sequencing depths differ.

Data files
Each predicted enhancer is described in BED12 format with two blocks denoting the merged regions of transcription initiation on the minus and plus strands. The thickStart and thickEnd columns denote the inferred mid position between blocks of transcription initiation events. Expression and usage matrices are tab delimited and the first row gives the FANTOM5 CNhs IDs and the first column the enhancer ID (same as column 4 in BED file). Usage matrices contain zeroes and ones (0:not used, 1:used).

  • enhancers (BED12 format)
  • enhancer expression matrix (tab delimited, first row: CNhs IDs, first column: enhancer ID (coordinate))
  • binary enhancer usage matrix (0:not used, 1:used, tab delimited, first row: CNhs IDs, first column: enhancer ID (coordinate))

Files

Files (7.6 MB)

Name Size Download all
md5:a3a560191f76e7b3ce5b883901618472
1.3 MB Download
md5:b616b0fef03149394e56a7003c036e4b
3.8 MB Download
md5:43f661ff667981f2407ddf7aa25e67d1
2.5 MB Download

Additional details

Related works

References
10.5281/zenodo.545682 (DOI)