Software Open Access

A multi-task graph clustering approach for chromosome conformation capture datasets identifies conserved modules of chromosomal interactions

Alireza Fotuhi Siahpirani; Ferhat Ay; Sushmita Roy

The three dimensional organization of the genome is emerging as a major determinant of gene regulation. Advances in chromosome conformation capture (3C) methods such as 4C, 5C and Hi-C are expanding our repertoire of data sets measuring the three dimensional organization for multiple cell types and organisms. An important challenge is to develop computational methods to interrogate these data to reveal three-dimensional organization of the genome and understand how this changes across different cell types and species. In this work we introduce a novel approach based on spectral clustering and multi-task clustering on graphs to exploit shared information between different Hi-C datasets and compare genome-wide contact count maps across different cell types and species. We apply our approach to four recently generated Hi-C datasets from human and mouse cell lines to define chromosomal conformation clusters. Compared to a clustering method that examines a dataset independently, our approach can find more biologically consistent patterns of conservation and divergence. We observed that the majority of our clusters are conserved between cell lines/species and show significant enrichment for similar genomic signals such as gene content, activating and repressive chromatin marks and DNase I hypersensitive sites. We find that majority of the divergence happens between clusters with similar chromatin state, however, there are exceptions. Such regions are associated with differential presence of Lamina Associated Domains (LADs) and open chromatin footprints when comparing different species, and with differential binding of architectural proteins such as CTCF, when comparing different cell lines. In summary, our clustering based approach provides a systematic approach to compare Hi-C contact count maps across multiple cell types and organisms while accounting for the graph-based nature of chromosomal conformation capture datasets.

This software is also available at: https://bitbucket.org/roygroup/arboretum-hic

This software is also available at: https://bitbucket.org/roygroup/arboretum-hic
Files (242.2 MB)
Name Size
roygroup-arboretum-hic-4295e5023188.zip
md5:679174d4ddd9e4ed2386044cf65b3ed7
242.2 MB Download
121
10
views
downloads
All versions This version
Views 121121
Downloads 1010
Data volume 2.4 GB2.4 GB
Unique views 119119
Unique downloads 77

Share

Cite as