Published May 1, 2025 | Version v5
Dataset Open

Dissection of core promoter syntax through single nucleotide resolution modeling of transcription initiation (CLIPNET data)

  • 1. ROR icon Cornell University

Description

This contains data necessary to reproduce the figures in the CLIPNET paper (preprint here) as well as processed data used to train and evaluate CLIPNET. To preserve subdirectory structure, we've packaged the data into tar archives. Please refer to the README documents in our manuscript GitHub repo for more details on file contents: https://github.com/Danko-Lab/clipnet_paper/
 
Pretrained CLIPNET models are archived separately at DOI 10.5281/zenodo.10408622
 
V5: Fixed bug in calculation of profile attribution scores causing them to be off by a factor of exactly 500. Genome-wide DeepSHAP tracks & TF-MoDISco tracks have been accordingly updated. I have not updated the individual examples, as these can be quickly fixed by simply multiplying by 500 when plotting. Additionally, I have uploaded profile and quantity motif calls, which contain genome-wide seqlet annotations. The columns in these files are [chrom, start, end, peak_idx, motif_annotation].
V4: Uploaded individual bigWigs. These have been lifted over using CrossMap from the original hg19 (GSE110638) to hg38 and RPM normalized.
V3: Final version prior to journal submission. Don't recall exact details of what's changed.
V2: evaluation_metrics.tar.gz and evaluation_data.tar.gz have been replaced. Previously, we benchmarked the models by treating each peak in each individual as a separate data point. Here, we instead predicted from the reference genome and compared against the averaged bigWigs.

Files

procap_library_prefixes.txt

Files (23.3 GB)

Name Size Download all
md5:d54187596554fdd9a200ea9270eeca67
4.3 GB Download
md5:5f7af4a8c27b235192685031a49dc1cb
727.7 MB Download
md5:9d2a8d0b6931b09dc1602b6a4257a4b1
59.7 MB Download
md5:eb5e484e8d157938b67f1717d95992b8
70.3 MB Download
md5:27d091d6be835d19820d1acd46f092c8
5.6 GB Download
md5:388d1d5c5b41c3d707f05d4a92d8cc0b
380.4 MB Download
md5:292e4eb0036b9d54ade4c6fa3f51bbb2
19.7 MB Download
md5:97c2aa8c4e190fbf7e55eae876dd61e3
2.4 kB Preview Download
md5:d07213856b2bb1d58991a69a3e24a503
5.9 kB Preview Download
md5:bda5010ed29ea28efb077b65848674aa
3.5 MB Download
md5:9dd13754815eaca81c8566dc4cbc3492
63.2 kB Download
md5:a2788d49a0635181b3be49e2d04c5e17
7.4 GB Download
md5:8d5e91af7ce57d56f349182239fe27ad
5.9 MB Download
md5:32631b1dac5858166c31e9c0b5e53d6d
941.5 MB Download
md5:36d43176de1cbd339c9ce4dec9471b1d
413.7 MB Download
md5:70ff66ab579961118a0fe533f0bb52d7
3.3 GB Download