Planned intervention: On Wednesday June 26th 05:30 UTC Zenodo will be unavailable for 10-20 minutes to perform a storage cluster upgrade.
Published October 29, 2019 | Version v3
Dataset Open

Index and biological spectrum of accessible DNA elements in the human genome

  • 1. Altius Institute for Biomedical Sciences

Description

Data associated with the manuscript titled
"Index and biological spectrum of accessible DNA elements in the human genome"
https://doi.org/10.1101/822510

Code repositories for these data are available here:

  • https://github.com/Altius/Index
  • https://github.com/Altius/Vocabulary


Tab-separated file with DNase I Hypersensitive Site (DHS) coordinates,
including DHS summits and core regions and assignments to regulatory components.
A separate legend file describes the contents of each column in more detail.

  • DHS_Index_and_Vocabulary_hg38_WM20190703.txt.gz
  • DHS_Index_and_Vocabulary_hg19_WM20190703.txt.gz (mapped using liftOver, not ideal)
  • DHS_Index_and_Vocabulary_legend.txt

 

Metadata files describing biosample characteristics and annotations,
provided in HTML, PDF, TSV and Excel formats:

  • DHS_Index_and_Vocabulary_metadata.html
  • DHS_Index_and_Vocabulary_metadata.pdf
  • DHS_Index_and_Vocabulary_metadata.tsv
  • DHS_Index_and_Vocabulary_metadata.xlsx

 

Presence/absence matrix of DHSs (rows) versus biosamples (columns),
provided in RData, MatrixMarket and raw formats:

  • dat_bin_FDR01_hg38.RData
  • dat_bin_FDR01_hg38.mtx.gz
  • dat_bin_FDR01_hg38.txt.gz
  • dat_bin_FDR01_hg19.RData (mapped using liftOver, not ideal)
  • dat_bin_FDR01_hg19.txt.gz (mapped using liftOver, not ideal)

 

Normalized DNase-seq signal matrix of DHSs (rows) versus biosamples (columns),
provided in RData and raw formats:

  • dat_FDR01_hg38.RData
  • dat_FDR01_hg38.txt.gz

The order of DHSs (rows) is the same as in the DHS Index file(s) above,
and the order of biosamples (columns) is the same as in the metadata files.

 

Non-negative Matrix Factorization (NMF) results, decomposing the presence/absence matrix (hg38) into 16 components:

  • 2018-06-08NC16_NNDSVD_Mixture.npy.gz
  • 2018-06-08NC16_NNDSVD_Basis.npy.gz

 

Putative transcription factor-specific regulatory elements,
identified using DHS Vocabulary components, TF motif databases and biosample-specific footprinting data:

  • TF_associated_DHSs_hg38.tar.gz

Files

DHS_Index_and_Vocabulary_legend.txt

Files (2.1 GB)

Name Size Download all
md5:0f964f78ec5c43046ab0a2098ecc0402
34.9 kB Download
md5:70cc7126addcfb5a70684a743c8ec45b
129.3 MB Download
md5:fc2eb85a6b08530850a2bf2157004bae
206.0 MB Download
md5:0ffc5a51c48e8064304a8fa8fe2ea253
88.1 MB Download
md5:86e061601e863f3923688d8bc3eeab1c
205.1 MB Download
md5:56686f286831842a4dfed273193f33fe
223.3 MB Download
md5:c4b8feff1cc714fee78586f8e27052bc
88.4 MB Download
md5:7fcd3303eaa04be9c122e25ff6548b1c
620.5 MB Download
md5:d695013808d6dce69f93f7d01ee1b4d2
415.3 MB Download
md5:d28da654512535f37af58e7f69c954b0
52.5 MB Download
md5:ccfdf2f7b1aaac3d2901ce47faf842bb
90.8 MB Download
md5:53086795a8eaa4fbce1a947689c3d3a9
691 Bytes Preview Download
md5:aabd5cb3a03ccd771d67333ec8700770
1.3 MB Download
md5:8c21f0a16fb5217cdfda7b44284d3d3a
1.0 MB Preview Download
md5:74cf095bcaf59d6ac83724eb0a3eadd6
221.7 kB Download
md5:5da578249d73c7ff2288a3420c52bd8b
183.7 kB Download
md5:46bb81d210fc7e3b676366ab98bdab12
1.1 kB Preview Download
md5:dead072a14ff56a6291fad992d2d40ae
3.4 MB Download