Published September 14, 2021 | Version 1.0
Dataset Open

Code and Data for "Multiple re-reads of single proteins at single-amino-acid resolution using nanopores"

  • 1. TU Delft
  • 2. U Illinois, Urbana-Champaign

Description

The primary structures containing data and analysis products are peptidereads_fig2.mat (for figure 2) and peptiderereads_fig3.mat (for figure 3). The main analysis scripts for these data structures are callvariants_fig2.m and reread_analysis_fig3.m respectively. Data for figures S6 (S6_reread_data.dat) and S8 (S8_hetero_data.dat), and the analysis script used to produce figure S6 (S6_reread_analysis.m) are also included. Other files are dependencies of these main scripts.

 

The fields in peptidereads_fig2 are as follows:

 

 

folder, eventnum, reducedStart, reducedEnd, suspicious, hasreread: notes for internal use

variant: the true identity of the single-amino-acid substitution variant

data: the ion current data for each read downsampled to 5 kHz.

omit: whether the read was omitted from analysis due to length

relativeDNAend: the index in the data where the DNA portion of the read ends.

relativeLinkerEnd: the index in the data where the linker portion of the read ends.

DNAlevels, Peplevels, Alllevels: extracted ion current levels for the DNA region, the peptide region, and everything.

cal: the multiplicative and additive constants applied to calibrate the read

caldata: the data with calibration constants applied

cons0D, cons0W, cons0G, cons0DNA: initial guesses for consensuses based on hand curation of data.

pepDcons0, pepWcons0, pepGcons0: the portion of the handmade consensus with the variant levels.

pepDcons, pepWcons, pepGcons: the portion of the iterated consensus with the variant levels.

inhandconsensus: whether the read was used in generation of the inital guess consensuses.

inconsensus: whether the read was used in generation of either the initial guess or iterated consensuses.

confidence: the relative likelihood of each variant assigned to the read

incalls: whether the data was used in variant calling (i.e., not used in consensus generation)

params: the analysis parameters used

 

Files

Files (165.4 MB)

Name Size Download all
md5:2932dae09a902b783fd481cc33d779e5
5.1 kB Download
md5:1bb7d2c6e645b434f68f8eddc559ec5b
16.7 kB Download
md5:0c6ea4002b0f251726edb4bfffe5bfb4
3.9 kB Download
md5:0ccd71d9c56eee3d4f7fd5d31bd1ebb1
3.4 kB Download
md5:97d08eff3c647f137e7d13eb25a23750
9.7 kB Download
md5:65e7e28ae96d0601789825cb2eb94249
3.9 kB Download
md5:ba5ac8120113e191fdaf079a13da7358
2.4 kB Download
md5:60baf6fa3f4f27152806ce5b22e1ca72
9.0 kB Download
md5:0c5a4665cef44145f17bdd8f58ff2079
409 Bytes Download
md5:3eaa640bdc9ddbb605d24f3f9e26d433
1.3 kB Download
md5:5b46d40537c9636276383de6b89dd079
1.2 kB Download
md5:67e46ad713ef8a429d9224586ef8c2bd
112.4 MB Download
md5:a37f851d7616753d98b6cfcf40ad09b8
52.2 MB Download
md5:be03ecd2ec553b4db9b2e80422a4d636
2.7 kB Download
md5:d17ac7f7db65bdae589c5491d1287a73
2.8 kB Download
md5:44add7c5078c71ccf076ca426e161e19
631 Bytes Download
md5:36e148523dc4a48ca9324406e7cd3bd7
2.1 kB Download
md5:3061c5f5255c2161ddbda34575159e95
4.5 kB Download
md5:3ae4c310b8ce1c381de34b6842157c6c
2.9 kB Download
md5:b57d45c69cf409ea5e6283cd2ed88086
28.9 kB Download
md5:8a90ea0bf8d76a87b430f40a00351e68
760.4 kB Download
md5:a2c9065e400f3052a2938a42623da578
149 Bytes Download