There is a newer version of the record available.

Published May 20, 2020 | Version 1.0
Preprint Open

Massively multiplex single-molecule oligonucleosome footprinting

  • 1. Department of Biochemistry & Biophysics, University of California San Francisco, San Francisco CA
  • 2. Department of Pediatrics, Stanford University, Palo Alto, CA
  • 3. Vector Institute, University of Toronto, Toronto, Canada
  • 4. Pacific Biosciences of California Inc, Menlo Park, CA

Description

These are the intermediate data used in "Massively multiplex single-molecule oligonucleosome footprinting", where the nonspecific adenine methyltransferase EcoGII was used to footprint accessible chromatin, and the methylation was then read using the Pacific Biosciences sequencing platform. The files here are intermediate outputs that capture metrics about the inter-pulse distance values as well as predictions of methylation status.

The .npy, .feather, and .pickle files are the output of extractIPD.py, and callNucPeaks.py. The .csv is the output of a cell in SAMOSA_analyses.ipynb. All of this code can be found at https://github.com/RamaniLab/SAMOSA, including SAMOSA_analyses.ipynb which contains all downstream analyses that were performed on this data.

 

meanIPDinfoChrControls.csv: This contains the data used to generate Supplementary Figures 2 and 3. It contains various summary measurements of the IPD values in each molecule of the in vitro samples

Files ending in _bingmm.npy: These contain the posterior probability of adenines being methylated for the in vitro data. The files beginning with pbrun3 were sequenced on the Sequel I, and the files beginning with pbrun4 or pbrun5 were sequenced on the Sequel II. Other than Supplementary Figures 2 and 3, the analysis in the paper was based on the Sequel II data. naked_neg and DNA_minusM are both negative controls. naked_methyl and DNA_plusM are positive controls. chromatin samples are in vitro assembled chromatin.

pbrun4_gold_nuc47_chromatin_peaks.feather: The estimated nucleosome centers from the in vitro assembled chromatin, in a data frame. Each row is an individual nucleosome dyad prediction.

Files ending in _onlyT_zmwinfo.pickle: These files each contain a pandas data frame containing information about each molecule sequenced in the in vivo samples. These should be read in in python using the pandas read_pickle function, and require the same namespace as was used when saving them, so pandas must be imported as pd, and numpy as np. The neg samples are deproteinated unmethylated molecules, the pos samples are deproteinated methylated molecules, and the chromatin samples are methylated chromatin.

Files ending in _bingmm.pickle: These files contain the posterior probability of being methylated for each adenine in each DNA molecule in the sample. They each have a corresponding zmwinfo file described above, and similarly must be read in with numpy imported as np. Each file is a dictionary with the zero-mode waveguide (ZMW) hole number as a key, and the value a numpy array with length equal to the unaligned CCS of that molecule, with methylation posterior probabilities at each A/T base. The zmwinfo dataframe has a 'zmw' column that can be used to match the information in that file with the methylation information in this one.

Files

meanIPDinfoChrControls.csv

Files (12.8 GB)

Name Size Download all
md5:1747a4e48bad7570b4ca2294af77d4da
26.4 MB Preview Download
md5:7ede279d8cb9d3f353683872386a8ab4
116.8 MB Download
md5:704a1df683d86faf8a0f407b1215e6bf
108.8 MB Download
md5:e8220dd1080def92c03c55bea6d1c2af
48.9 MB Download
md5:45cd3b8612db50ca8607f19f9d177c0c
86.9 MB Download
md5:bc027385cb507f2f1c604d2e29aaba43
1.7 MB Download
md5:1392d54fe3dd8a154b073114268db8fb
668.6 MB Download
md5:fe8d2885afacd8446c2d0ec5b47b2ff7
39.2 MB Download
md5:f323f97373c8ac8d0ba30b48c8ebd10c
656.6 MB Download
md5:c8ea8eeb555ff5249cdd64755c0d92ab
36.1 MB Download
md5:14c07d067b39fde09af8db4ad69000ab
845.7 MB Download
md5:e22936e29bdcc774053ecba0693c5b3f
51.3 MB Download
md5:4decb46286427cf0c1400359287f335a
618.7 MB Download
md5:b0fb4afd5ae2e0d6fbc8ad00b693e0df
35.4 MB Download
md5:88922b07b0c2fca88665b06199a2c5f8
639.8 MB Download
md5:45f89895bcbb2705713ef942e942d3cb
37.5 MB Download
md5:7ab41844a64030aa21f305e9058d2421
1.1 GB Download
md5:5b4d2fe3dce5e2816886f970642a944c
70.0 MB Download
md5:49cfc92ae9379ce2d6fc3c141c7ff8b7
105.7 MB Download
md5:437957e215a8911c3297c9296aa9188b
142.5 MB Download
md5:b3a76466045c613c6f01b86f063254ae
3.3 GB Download
md5:7b24a4d0a166addb78c3e1e33cab633c
110.5 MB Download
md5:41322268b992bd5b4f826f3fbb87dba8
3.8 GB Download
md5:2630fb0ed7e4bcee7d740e1838268785
115.4 MB Download