Dataset corresponding to "Widespread low-affinity motifs enhance chromatin accessibility and regulatory potential in mESCs"
Description
This Zenodo repository contains data pertinent for the work conducted in "Widespread low-affinity motifs enhance chromatin accessibility and regulatory potential in mESCs" from the Zeitlinger lab. Questions about this dataset can be referred to Melanie Weilert (mweilert@stowers.org) or Julia Zeitlinger (jbz@stowers.org) at the Stowers Institute for Medical Research.
Below are descriptions of each file:
Motif coordinates:
For column descriptions, please refer to the code reported in (https://github.com/zeitlingerlab/Weilert_mESC_accessibility_2025/tree/main/2_analysis) with particular attention to `5_curate_motifs.html` and `10_assign_predictions.html`.
+ all_instances_curated_0based_w_perturb.tsv.gz: Genomic motif mapping coordinates (0-based) of Oct4-Sox2, Sox2, Nanog, Klf4, and Zic3 mapped by the R1 mESC BPNet and ChromBPNet models described in this work.
+ all_xiong_instances_curated_0based_w_perturb.tsv.gz: Genomic motif mapping coordinates (0-based) of motifs mapped by the ZHBTc4 mESC ChromBPNet model (trained on data obtained from Xiong et al 2022 at the 0h timepoint reflective of the wildtype condition of Oct4 concentration).
+ all_pairs_curated_0based_w_perturb.tsv.gz: Motif pairs (0-based) of Oct4-Sox2, Sox2, Nanog, Klf4, and Zic3 motifs mapped by the R1 mESC BPNet and ChromBPNet models with the predicted response from all models upon combinatorial mutations of each motif in a motif pair (AB/A/B/none).
+ all_xiong_pairs_curated_0based_w_perturb.tsv.gz: Motif pairs (0-based) of of motifs mapped by the ZHBTc4 mESC ChromBPNet model (trained on data obtained from Xiong et al 2022 at the 0h timepoint reflective of the wildtype condition of Oct4 concentration) with the predicted response from all models upon combinatorial mutations of each motif in a motif pair (AB/A/B/none).
ChromBPNet and BPNet models:
The zipped file `models.tar.gz` contains the deep learning models trained in this work. BPReveal (v.4.0.4) was used to train these models using TensorFlow (v.2.15.0).
ChromBPNet and BPNet predictions:
The zipped file `preds.tar.gz` contains bigWig (.bw) files of the output prediction coverages from every trained BPNet and ChromBPNet model in this work across the ~151K regions used for training and interpretation across the mm10 genome.
DeepSHAP contribution scores:
The zipped file `shap.tar.gz` contains bigWig (.bw) files of the contribution scores generated by DeepSHAP (BPReveal implementation) from every trained BPNet and ChromBPNet model in this work across the ~151K regions used for training and interpretation across the mm10 genome.
TF-MoDISco:
The zipped file `modisco.tar.gz` contains TF-MoDISco results derived from the contribution scores. These include (1) modisco.h5 files, (2) logo PNG files, (3) reports of motifs found and matching database examples. TF-MoDISco results are from every trained BPNet and ChromBPNet model in this work across the ~151K regions used for training and interpretation across the mm10 genome.
Files
Files
(23.6 GB)
Name | Size | Download all |
---|---|---|
md5:b415d959db219bcfa4566dbdd9d406f3
|
168.6 MB | Download |
md5:c0e9f584ba7db20b9e38f643b38c94be
|
630.6 MB | Download |
md5:f727ecc1a4a00d3add375cb2fd387add
|
58.6 MB | Download |
md5:9b8e3e615e23b50e34513ddd62bb46bb
|
83.1 MB | Download |
md5:5527b6ea358ee8f077b6f5abfea04921
|
260.8 MB | Download |
md5:a6116e629fbe0a962e8862d6276ec77b
|
1.4 GB | Download |
md5:9567f8efe1ae67ce54e533c1bae3b47e
|
12.3 GB | Download |
md5:65602b4e12a437104f956b3b72baf041
|
8.8 GB | Download |
Additional details
Software
- Repository URL
- https://github.com/zeitlingerlab/Weilert_mESC_accessibility_2025/tree/main/2_analysis
- Programming language
- Python, R