Published July 10, 2020 | Version 1.3
Dataset Open

Global consensus map of human transcription factor footprints

  • 1. Altius Institute for Biomedical Sciences

Description

Vierstra, J. et al. Global reference mapping of human transcription factor footprints. Nature 583, 729–736 (2020). https://doi.org/10.1038/s41586-020-2528-x

Preprint @ bioRxiv: https://doi.org/10.1101/2020.01.31.927798

Contact: Jeff Vierstra (jvierstra@altius.org)

Genomic DNase I footprinting enables quantitative, nucleotide-resolution delineation of sites of transcription factor occupancy within native chromatin. We combined sampling of >67 billion uniquely mapping DNase I cleavages from >240 human cell types and states to index, with unprecedented accuracy and resolution, human genomic footprints and thereby the sequence elements that encode transcription factor recognition sites.

Please see http://vierstra.org/resources/dgf for additional information and a complete set of raw DNase I data for individual datasets. Additionally, raw data can also be accessed via the ENCODE data portal (http://encodeproject.org) using the dataset accessions found in Supplementary Table 1.

Code for footprint analysis and tutorials on how to access and manipulate digital genomic footprint data can be found at https://footprint-tools.readthedocs.io/en/latest/.

All files herein correspond to human genome build version GRCh38 (UCSC hg38).

Dataset contents:

  • Biosample metadata – Supplementary_Table_1.xlsx
  • Motif clustering metadata – Supplementary_Table_2.xlsx
  • ChIP-seq validation metadata  Supplementary_Table_3.xlsx
  • Consensus footprint coordinates and assigned motif archetypes
    TSV file (BED-format) with consensus footprint (posterior probability>0.99) coordinates and overlaps with matches to motif model clusters. The legend file contains column definitions in detail.
    • consensus_footprints_and_motifs_hg38.bed.gz
    • consensus_footprints_and_motifs_legend.txt
  • Motif archetype matches overlapping consensus footprints
    TSV file (BED-format) containing the coordinates for clustered motif model matches that overlap consensus footprints
    • collapsed_motifs_overlaping_consensus_footprints.bed.gz
    • collapsed_motifs_overlaping_consensus_footprints_legend.txt
  • Footprint occupancy matrix of consensus footprints
    Rows are same order as the consensus footprint file and columns are same order as in the metadata files.
    • consensus_index_matrix_full_hg38.txt.gz (Values are –log(1-posterior))
    • consensus_index_matrix_binary_hg38.txt.gz (binary occupancy matrix, where footprints with posterior footprint probability >0.99 are considered occupied)
  • Single nucleotide variants tested for allelic imbalance 
    The legend file contains column definitions in detail.
    • genotypes.vcf.gz - Genotyping and allelic read depth for each biosample (see header for more information)
    • tested_snvs_padj.bed.gz - SNVs tested for imbalance (TSV, BED-format)
    • tested_snvs_padj_legend.txt

Notes

This work was supported by NIH grants U54HG007010 and 5UM1HG009444.

Files

collapsed_motifs_overlapping_consensus_footprints_legend.txt

Files (2.9 GB)

Name Size Download all
md5:c9abbcb6eab93b7912c6a033325e66d1
427.6 MB Download
md5:971d028d1649dd7d7c4320f95fd54dd5
799 Bytes Preview Download
md5:1ab941adcda42e7b54cd93da89dd9723
584.0 MB Download
md5:35adfcbdf0e4d4fb6b2a15e2ad6deee4
1.3 kB Preview Download
md5:15f3da13cd57217af1407e0271251ab6
65.5 MB Download
md5:099f7fc5a080afde2002646c8bfa3e7d
1.4 GB Download
md5:52096331fbfec008a1a198f888e81ca5
354.8 MB Download
md5:920e1eee2547cef2c4f235ae8eaba3ff
42.4 kB Download
md5:920e1eee2547cef2c4f235ae8eaba3ff
42.4 kB Download
md5:920e1eee2547cef2c4f235ae8eaba3ff
42.4 kB Download
md5:a8deccf73a5d4ee45bf03cfa8aed37de
35.3 MB Download
md5:29db108b5dd59232e8e7ad3fa77d0634
899 Bytes Preview Download

Additional details

References

  • Vierstra et al. Global reference mapping and dynamics of human transcription factor footprints. (2020). bioRxiv