Dataset Open Access

Raw and processed SPT data for "Guided nuclear exploration increases CTCF target search efficiency"

Hansen, Anders S; Amitai, Assaf; Cattoglio, Claudia; Tjian, Robert; Darzacq, Xavier

Raw and processed SPT data for “Guided nuclear exploration increases CTCF target search efficiency”

Anders S. Hansen1,2,3,4,*, Assaf Amitai5,*, Claudia Cattoglio1,2,3,4, Robert Tjian1,2,3,4, Xavier Darzacq1,2,3


1: Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, USA;

2: Li Ka Shing Center for Biomedical and Health Sciences

3: CIRM Center of Excellence, University of California, Berkeley, Berkeley, USA;

4: Howard Hughes Medical Institute, University of California, Berkeley, Berkeley, USA;

5: Department of Chemical Engineering, MIT, Cambridge 02139, Massachusetts, USA

*: ASH and AA contributed equally.



This repository contains all the raw and processed spaSPT data associated with “Guided nuclear exploration increases CTCF target search efficiency”. In total, this dataset contains data for 1,669 single-cell movies and trajectories with 85,612,029 unique displacements. In this ReadMe file we provide the following information:

  • The cell lines used in this study.
  • How the data was collected.
  • How to read the raw single-cell SPT spaSPT data.
  • How to read the processed, quality-controlled and HMM-classified data

The document does not contain information about how the data was analyzed. For details on how the data was analyzed and for raw code to reproduce our figures, please go to


Cell lines and transient transfection constructs

In total, 20 different cell lines were studied here. The mouse embryonic stem cells (mESCs) are in the JM8.N4 background (Pettitt et al., 2009) and the human cells were U2OS osteosarcoma cells. Cell lines expressing Halo-tagged proteins were either homozygous knock-in cell lines, wild-type cell lines transiently transfected with a plasmid over-expressing the protein of interest (Lipofectamine 3000), or wild-type cell lines stably over-expressing a transgene.

The wild-type and C59 mESC cell lines were pathogen tested using the IMPACT II test (performed by IDEXX BioResearch) and where negative for all tested pathogens (full details provided in (Hansen et al., 2017)). The wild-type and C32 U2OS cell lines were tested for mycoplasma contamination and found to be clean and further authenticated by Short Tandem Repeat Profiling (STR profiling; 100% match with U2OS; full details provided in (Hansen et al., 2017)).

Further details on each cell line or transfection construct are provided in the table in the ReadMe PDF (the MAT-file name is the name of the MAT-file containing the raw data).

Data collection using stroboscopic photo-activation single-particle tracking (spaSPT)

For full details on how the experiments where performed, please see the Materials and Methods section of the manuscript. Here we will briefly describe the protocol.

To systematically investigate how proteins explore the nucleus we need a very large dataset (i.e. hundreds of thousands of trajectories) of very high quality SPT data (i.e. with minimal bias and without many tracking errors). The two major sources of bias in experimental SPT data are 1) motion-blurring and 2) tracking errors (Hansen et al., 2018). First, motion-blurring biases against detecting fast-moving molecules since while the camera is exposed, a fast-moving molecule will spread its photons over many pixels and no longer resemble a diffraction limited spot, whereas a bound or slow-moving molecule will emit all its photons as a single diffraction limited spot. Since most localization algorithms readily detect diffraction limited spots, but not motion-blurs this introduces a clear bias. spaSPT overcomes this bias by strobing the excitation laser (Elf et al., 2007): using just a single 1 ms excitation pulse, we simultaneously achieve high signal-to-background (~10-fold), while minimizing motion-blur as previously demonstrated (for a more complete discussion, please see (Hansen et al., 2018)). Second, tracking errors are largely caused by high particle densities: for example, when trajectories overlap, tracking errors occur which clearly prevents accurately analysis of protein diffusion and which could lead to artefactual conclusions. By using the bright photo-activatable Janelia Fluor dyes PA-JF549 and PA-JF646 (Grimm et al., 2016), we can overcome this challenge (this is also known as sptPALM (Manley et al., 2008)). During the camera integration time between frame (~447 microseconds), we pulse the 405 nm photo-activation laser at an intensity such that the mean localization density is around 1 molecule per nucleus per frame. At this density, tracking errors are greatly reduced, yet we are able to collect tens of thousands of frames at this density per cell and thus obtain large amounts of data despite imaging at a very low density. For a more complete discussion of spaSPT please see (Hansen et al., 2017, 2018).

To generate a large dataset at multiple time-scales, we collected spaSPT data (generally around 8 cells per replicate and four biological replicates; occasionally a cell would be removed from subsequent analysis if the localization density was too high and thus prone to tracking errors) at 3 camera exposure times: 4 ms (add ~447 microseconds), 7 ms (add ~447 microseconds) and 13 ms (add ~447 microseconds); this roughly corresponds to frame-rates of ~223 Hz, 133 Hz, and 74 Hz. To generate data at longer lag times (this was only done after HMM-classification; see also main paper), this data was subsampled according to the table in the ReadMe PDF.

Overview of raw, unprocessed single-cell spaSPT data

This repository contains all the raw, unprocessed spaSPT data for each single cell in the directory “UnprocessedSingleCellData”. In total, this directory contains 1669 MAT-files corresponding to the 1669 single-movies recorded here. The files are systematically named for their parent cell line, replicate number and cell number. For example, “mESC_C59_Halo-mCTCF_133Hz_Rep4_Cell8.mat” contains SPT data for mouse embryonic stem cell line clone 59 and Halo-CTCF was imaged at ~133 Hz; the file is for cell number 8 in biological replicate number 4. All the other single-cell experiments are similarly named. Each MAT-file contain all the trajectories recording from a single-cell movie and stored as a “structure array” object, “trackedPar”. trackedPar contains three variables for each trajectory:

  • trackedPar.xy: “xy” is a matrix with 2 columns and a number of rows corresponding to the number of frames where the molecule was located. The first column is the x-coordinate and the second column is the y-coordinate and the units are micrometers.
  • trackedPar.Frame: “Frame” is a column vector where each element is an integer describing the frame number wherein the particle was localized (each trajectory contains at most 1 gap between frames).
  • trackedPar.TimeStamp: “TimeStamp” is a column vector where each element is the timepoint (in units of seconds) where the molecule was localized.

Note that this Matlab format is directly readable by Spot-On:


Overview of HMM-classified and processed SPT data

The raw spaSPT data was processed (to remove any rare tracking errors) and all the data merged. The data was the classified into either a “bound” or “free” state using a 2-state Hidden-Markov Model (HHM; vbSPT (Persson et al., 2013)) and then temporally subsampled to generate data at longer frame rates. Full details on how the data was processed are given in the Materials and Methods section as well as on GitLab. Raw code to reproduce all the figures from unprocessed single-cell spaSPT data is also available at GitLab:

All the HMM-classified data for each cell line at each frame rate is stored in a single MAT-file. The MAT-file contains two key cell arrays: “CellTracks” and “CellTrackViterbiClass”. The two cell arrays contain the XY-coordinates and classification, respectively. More, specifically, suppose trajectory k is made up of n localizations. Then:

CellTracks{k}: a matrix of n rows and 2 columns containing the x,y coordinates in units of micrometers.

CellTrackViterbiClass{k}: a column vector of length n-1 where each entry is an integer: either “1” or “2”. Thus, the length of this vector is 1 less than the number of rows in the CellTracks matrix. This is because only the displacements and not the localizations are classified. For example, the CellTrackViterbiClass{k} reads [1;1;1;2;2] it means that there were 6 localizations and that the first 3 displacements (1à2, 2à3, 3à4) were classified as “bound” and the last 2 displacements (4à5, 5à6) were classified as “free”.  




Elf, J., Li, G.-W., and Xie, X.S. (2007). Probing transcription factor dynamics at the single-molecule level in a living cell. Science 316, 1191–1194.

Grimm, J.B., English, B.P., Choi, H., Muthusamy, A.K., Mehl, B.P., Dong, P., Brown, T.A., Lippincott-Schwartz, J., Liu, Z., Lionnet, T., et al. (2016). Bright photoactivatable fluorophores for single-molecule imaging. Nat. Methods 66779.

Hansen, A.S., Pustova, I., Cattoglio, C., Tjian, R., and Darzacq, X. (2017). CTCF and cohesin regulate chromatin loop stability with distinct dynamics. Elife 6.

Hansen, A.S., Woringer, M., Grimm, J.B., Lavis, L.D., Tjian, R., and Darzacq, X. (2018). Robust model-based analysis of single-particle tracking experiments with Spot-On. Elife 7, e33125.

Manley, S., Gillette, J.M., Patterson, G.H., Shroff, H., Hess, H.F., Betzig, E., and Lippincott-Schwartz, J. (2008). High-density mapping of single-molecule trajectories with photoactivated localization microscopy. Nat. Methods 5, 155–157.

Persson, F., Lindén, M., Unoson, C., and Elf, J. (2013). Extracting intracellular diffusive states and transition rates from single-molecule tracking data. Nat. Methods 10, 265–269.

Pettitt, S.J., Liang, Q., Rairdan, X.Y., Moran, J.L., Prosser, H.M., Beier, D.R., Lloyd, K.C., Bradley, A., and Skarnes, W.C. (2009). Agouti C57BL/6N embryonic stem cells for mouse genetic resources. Nat. Methods 6, 493–495.

Teves, S.S., An, L., Hansen, A.S., Xie, L., Darzacq, X., and Tjian, R. (2016). A dynamic mode of mitotic bookmarking by transcription factors. Elife 5.




Files (3.5 GB)
Name Size
3.5 GB Download
All versions This version
Views 323324
Downloads 2222
Data volume 77.8 GB77.8 GB
Unique views 305306
Unique downloads 2222


Cite as