Published September 13, 2022 | Version v1
Journal article Open

Current sequence-based models capture gene expression determinants in promoters but mostly ignore distal enhancers

  • 1. Department of Informatics, Technical University of Munich, Garching, Germany

Description

The largest sequence-based models of transcription control to date have been obtained by predicting genome-wide gene regulatory assays across the human genome. This setting is fundamentally correlative, as those models are exposed during training solely to the sequence variation between human genes that arose through evolution, questioning the extent to which those models capture genuine causal signals. Here we confront predictions of state-of-the-art models of transcription regulation against data from two large-scale observational studies and five deep perturbation assays.

Files

SequenceBenchmark.zip

Files (10.8 GB)

Name Size Download all
md5:09063e523173c49c017916d4bdba90b3
10.8 GB Preview Download