Dataset Open Access

Global consensus map of human transcription factor footprints

Vierstra, Jeff; Stamatoyannopoulos, John A

Vierstra, J. et al. Global reference mapping of human transcription factor footprints. Nature 583, 729–736 (2020). https://doi.org/10.1038/s41586-020-2528-x

Preprint @ bioRxiv: https://doi.org/10.1101/2020.01.31.927798

Contact: Jeff Vierstra (jvierstra@altius.org)

Genomic DNase I footprinting enables quantitative, nucleotide-resolution delineation of sites of transcription factor occupancy within native chromatin. We combined sampling of >67 billion uniquely mapping DNase I cleavages from >240 human cell types and states to index, with unprecedented accuracy and resolution, human genomic footprints and thereby the sequence elements that encode transcription factor recognition sites.

Please see http://vierstra.org/resources/dgf for additional information and a complete set of raw DNase I data for individual datasets. Additionally, raw data can also be accessed via the ENCODE data portal (http://encodeproject.org) using the dataset accessions found in Supplementary Table 1.

Code for footprint analysis and tutorials on how to access and manipulate digital genomic footprint data can be found at https://footprint-tools.readthedocs.io/en/latest/.

All files herein correspond to human genome build version GRCh38 (UCSC hg38).

Dataset contents:

  • Biosample metadata – Supplementary_Table_1.xlsx
  • Motif clustering metadata – Supplementary_Table_2.xlsx
  • ChIP-seq validation metadata  Supplementary_Table_3.xlsx
  • Consensus footprint coordinates and assigned motif archetypes
    TSV file (BED-format) with consensus footprint (posterior probability>0.99) coordinates and overlaps with matches to motif model clusters. The legend file contains column definitions in detail.
    • consensus_footprints_and_motifs_hg38.bed.gz
    • consensus_footprints_and_motifs_legend.txt
  • Motif archetype matches overlapping consensus footprints
    TSV file (BED-format) containing the coordinates for clustered motif model matches that overlap consensus footprints
    • collapsed_motifs_overlaping_consensus_footprints.bed.gz
    • collapsed_motifs_overlaping_consensus_footprints_legend.txt
  • Footprint occupancy matrix of consensus footprints
    Rows are same order as the consensus footprint file and columns are same order as in the metadata files.
    • consensus_index_matrix_full_hg38.txt.gz (Values are –log(1-posterior))
    • consensus_index_matrix_binary_hg38.txt.gz (binary occupancy matrix, where footprints with posterior footprint probability >0.99 are considered occupied)
  • Single nucleotide variants tested for allelic imbalance 
    The legend file contains column definitions in detail.
    • genotypes.vcf.gz - Genotyping and allelic read depth for each biosample (see header for more information)
    • tested_snvs_padj.bed.gz - SNVs tested for imbalance (TSV, BED-format)
    • tested_snvs_padj_legend.txt

This work was supported by NIH grants U54HG007010 and 5UM1HG009444.
Files (2.9 GB)
Name Size
collapsed_motifs_overlapping_consensus_footprints_hg38.bed.gz
md5:c9abbcb6eab93b7912c6a033325e66d1
427.6 MB Download
collapsed_motifs_overlapping_consensus_footprints_legend.txt
md5:971d028d1649dd7d7c4320f95fd54dd5
799 Bytes Download
consensus_footprints_and_motifs_hg38.bed.gz
md5:1ab941adcda42e7b54cd93da89dd9723
584.0 MB Download
consensus_footprints_and_motifs_legend.txt
md5:35adfcbdf0e4d4fb6b2a15e2ad6deee4
1.3 kB Download
consensus_index_matrix_binary_hg38.txt.gz
md5:15f3da13cd57217af1407e0271251ab6
65.5 MB Download
consensus_index_matrix_full_hg38.txt.gz
md5:099f7fc5a080afde2002646c8bfa3e7d
1.4 GB Download
genotypes.vcf.gz
md5:52096331fbfec008a1a198f888e81ca5
354.8 MB Download
Supplementary_Table_1.xlsx
md5:920e1eee2547cef2c4f235ae8eaba3ff
42.4 kB Download
Supplementary_Table_2.xlsx
md5:920e1eee2547cef2c4f235ae8eaba3ff
42.4 kB Download
Supplementary_Table_3.xlsx
md5:920e1eee2547cef2c4f235ae8eaba3ff
42.4 kB Download
tested_snvs_padj.bed.gz
md5:a8deccf73a5d4ee45bf03cfa8aed37de
35.3 MB Download
tested_snvs_padj_legend.txt
md5:29db108b5dd59232e8e7ad3fa77d0634
899 Bytes Download
  • Vierstra et al. Global reference mapping and dynamics of human transcription factor footprints. (2020). bioRxiv

1,394
728
views
downloads
All versions This version
Views 1,394886
Downloads 728550
Data volume 218.8 GB154.3 GB
Unique views 1,172810
Unique downloads 281223

Share

Cite as