File: GM12878_200kb_10000str.hss Description: Population of 10000 full diploid genome structures of human lymphoblastoid GM12878 cell line, 200kb resolution, generated using the Integrated Genome Modeling (IGM) software developed in the Alber lab at UCLA. For IGM, see https://github.com/alberlab/igm. The file format is HDF5 container. The main object is the 3D array of 10,000 xyz coordinates. Here is a short example to read the coordinates and radii in python using h5py: ``` import h5py h5f = h5py.File(filename, 'r') coordinates = h5f['coordinates'][:] radii = h5f['radii'][:] h5f.close() ``` Coordinates is the 3D array [#beads] × [#structures] × [3 × float32]: the 3D coordinates of the genome population. One set of xyz can be retrieved using coordinates[:, i, :]. The file can also be read using alabtools, for more info see https://github.com/alberlab/alabtools. File: GM12878_200kb_stfeat.txt Description: Text file containing 17 structural features derived from 3D models for each 200-kb region in the genome. Each row is a 200-kb region, and each column is a structural feature. Here is the list of columns: - chrom : chromosome - start : start position - end : end position - domain : information on whether this region was restrained during modeling. domain: restrained, cen: unrestrained (and not used in downstream analysis). - radial: mean radial position (RAD) - radvar: cell-to-cell variability of radial positions - ilf: interior localization frequency (ILF) - rg: local chromatin fiber decompaction (nm) (RG) - rgvar: cell-to-cell variability of local decompaction - spdist: mean speckle distance (nm) (SpD) - spdistvar: cell-to-cell variability of speckle distances - saf: speckle association frequency (SAF) - speckletsaseq: predicted SON TSA-seq (S-TSA) - nucdist: mean nucleoli distance (nm) (NuD) - nucdistvar: cell-to-cell variability of nucleoli distances - naf: nucleoli association frequency - nucleolitsaseq: predicted nucleoli TSA-seq (N-TSA) - laf: lamina association frequency - laminatsaseq: predicted lamin TSA-seq (L-TSA) - transint: interchromosomal contact probability (ICP) - transabratio: median trans A/B ratio