Published July 5, 2023 | Version v3
Dataset Open

Matrices N, TEs vs promoters, weighted with L in [1e3kB, 1e10kB], hg19

Creators

  • 1. Swiss Federal Institute of Technology

Description

Related to "Statistical learning quantifies transposable element-mediated cis-regulation", Pulver et al. 2023

Regulatory susceptibility matrices N, with rows as hg19 protein coding genes and columns as TE subfamilies. For any given gene, regulatory TEs are strictly out of any promoter and any exon belonging to that gene. The contribution of each single TE is weighted by a gaussian kernel centered on the closest promoter of that gene, with varying bandwiths spanning the range L = 1e3kB to 1e10kB.

The "TAD_restricted" N matrices were built by bounding the distances until which TEs were considered as putative cis-regulatory elements for protein-coding genes to TAD boundaries. Pairs of genes - TEs that do not overlap TADs are weighted irrespective of TAD boundaries.

The "weighted_mappability" N matrices only contain selected subfamilies and should be incorporated into N_weighted computed with L = 2.5e5kb before usage. Low vs high mappability fractions of subfamilies were separated using a median split on per-integrant average mappability scores.

Files

Files (3.8 GB)

Name Size Download all
md5:4fb54db948f2866de4967872124dfe8a
2.2 GB Download
md5:4e153ede101d8a0df5bddc6c9691bc5a
4.6 MB Download
md5:361d5c86c68e4f0e9c4897f966390005
1.5 GB Download