Published November 14, 2019 | Version v1
Journal article Open

Identifying statistically significant chromatin contacts from Hi-C data with FitHiC2

  • 1. Department of Bioengineering, University of California San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA
  • 2. Division of Vaccine Discovery, La Jolla Institute for Immunology, La Jolla, CA 92037, USA

Description

Fit-Hi-C is a programming application to compute statistical confidence estimates for Hi-C contact maps to identify significant chromatin contacts. By fitting a monotonically non-increasing spline, Fit-Hi-C captures the relationship between genomic distance and contact probability without any parametric assumption. The spline fit together with the correction of contact probabilities with respect to bin- or locus-specific biases account for previously characterized covariates impacting Hi-C contact counts. Fit-Hi-C is best applied for the study of mid-range (e.g., 20Kb – 2Mb for human genome) intra-chromosomal contacts, however, with the latest reimplementation, named FitHiC2, it is possible to perform genome-wide analysis for high-resolution Hi-C data including all intra-chromosomal distances and inter-chromosomal contacts. FitHiC2 also offers a merging filter module, which eliminates indirect/bystander interactions, leading to significant reduction in the number of reported contacts without sacrificing recovery of key loops such as those between convergent CTCF binding sites. Here we describe how to apply the FitHiC2 protocol to three use cases: (i) 5kb resolution Hi-C data of chromosome 5 from GM12878 (a human lymphoblastoid cell line), (ii) 40kb resolution whole genome Hi-C data from IMR90 (human lung fibroblast), and (iii) budding yeast whole genome Hi-C data at a single restriction cut site (EcoRI) resolution. The procedure takes ~10 hours when all use cases are run sequentially (~4h when run parallel). With the recent improvements in its implementation, FitHiC2 (8 processors and 16GB RAM) is also scalable to genome-wide analysis of the highest resolution (1kb) Hi-C data available to date (~48h with 32GB peak memory). FitHiC2 is available through Bioconda, Github and the Python Package Index.

Notes

Data supporting the upgraded protocol of FitHiC version 2 (denoted as FitHiC2)

Files

Submitted_Data_FitHiC2.zip

Files (172.4 MB)

Name Size Download all
md5:eb551652a4d293df3c7c218658da6f34
172.4 MB Preview Download

Additional details

Related works

Is supplemented by
10.24433/CO.5589539.v2 (DOI)