Clonal decomposition and DNA replication states defined by scaled single cell genome sequencing

McPherson, Andrew William

doi:10.5281/zenodo.3445364

Published September 15, 2019 | Version v2

Dataset Open

Clonal decomposition and DNA replication states defined by scaled single cell genome sequencing

McPherson, Andrew William¹

1. Memorial Sloan Kettering Cancer Center

OV2295 Tables

ov2295_breakpoint_counts.csv.gz: Table of breakpoint counts per cell

prediction_id: identifier for the breakpoint
cell_id: identifier for the cell
read_count: number of reads
library_id: identifier for the DNA library
sample_id: identifier for the sequenced sample
chromosome_1: chromosome of breakend 1
strand_1: orientation of break end 1
position_1: position of break end 1
chromosome_2: chromosome of breakend 2
strand_2: orientation of break end 2
position_2: position of break end 2

ov2295_cell_cn.csv.gz: Table of cell specific copy number

cell_id: identifier for the cell
sample_id: identifier for the sequenced sample
library_id: identifier for the DNA library
chr: chromosome of bin
start: start of bin
end: end of bin
reads: number of reads
copy: raw normalized copy number
state: copy number state

ov2295_cell_metrics.csv.gz: Table of cell metrics

cell_id: identifier of the cell
unpaired_mapped_reads: number of unpaired mapped reads
paired_mapped_reads: number of mapped reads that were properly paired
unpaired_duplicate_reads: number of unpaired duplicated reads
paired_duplicate_reads: number of paired reads that were also marked as duplicate
unmapped_reads: number of unmapped reads
percent_duplicate_reads: percentage of duplicate reads
estimated_library_size: scaled total number of mapped reads
total_reads: total number of reads, regardless of mapping status
total_mapped_reads: total number of mapped reads
total_duplicate_reads: number of duplicate reads
total_properly_paired: number of properly paired reads
coverage_breadth: percentage of genome covered by some read
coverage_depth: average reads per nucleotide position in the genome
median_insert_size: median insert size between paired reads
mean_insert_size: mean insert size between paired reads
standard_deviation_insert_size: standard deviation of the insert size between paired reads
index_sequence: index sequence of the adaptor sequence
column: column of the cell on the nanowell chip
img_col: column of the cell from the perspective of the microscope
index_i5: id of the i5 index adapter sequence
sample_type: type of the sample
primer_i7: id of the i5 index primer sequence
experimental_condition: experimental treatment of the cell, includes controls
index_i7: id of the i7 index adapter sequence
cell_call: living/dead classification of the cell based on staining usually, C1 == living, C2 == dead
sample_id: name of the sample
primer_i5: id of the i5 index primer sequence
row: row of the cell on the nanowell chip
library_id: identifier for the DNA library
index: ignored
multiplier: during parameter searching, the set [1..6] that was chosen
MSRSI_non_integerness: median of segment residuals from segment integer copy number states
MBRSI_dispersion_non_integerness: median of bin residuals from segment integer copy number states
MBRSM_dispersion: median of bin residuals from segment median copy number values
autocorrelation_hmmcopy: hmmcopy copy autocorrelation
cv_hmmcopy: ignored
empty_bins_hmmcopy: number of empty bins in hmmcopy
mad_hmmcopy: median absolute deviation of hmmcopy copy
mean_hmmcopy_reads_per_bin: mean reads per hmmcopy bin
median_hmmcopy_reads_per_bin: median reads per hmmcopy bin
std_hmmcopy_reads_per_bin: standard deviation value of reads in hmmcopy bins
total_halfiness: summed halfiness penality score of the cell
total_mapped_reads_hmmcopy: total mapped reads in all hmmcopy bins
scaled_halfiness: summed scaled halfiness penalty score of the cell
mean_state_mads: mean value for all median absolute deviation scores for each state
mean_state_vars: variance value for all median absolute deviation scores for each state
mad_neutral_state: median absolute deviation score of the neutral 2 copy state
breakpoints: number of breakpoints, as indicated by state changes not at the ends of chromosomes
mean_copy: mean hmmcopy copy value
state_mode: the most commonly occuring state
log_likelihood: hmmcopy log likelihood for the cell
true_multiplier: the exact decimal value used to scale the copy number for segmentation
order: order of the cell in the hierarchical clustering tree
quality: random forest classifier proability score that cell is good

ov2295_clone_alleles.csv.gz: Table of clone specific allele data

chr: chromosome of bin
start: start of bin
end: end of bin
hap_label: haplotype block identifier
clone_id: clone identifier
allele_1_sum: number of reads for allele 1 of the haplotype block
allele_2_sum: number of reads for allele 2 of the haplotype block
total_counts_sum: total reads for the haplotype block

ov2295_clone_breakpoints.csv.gz: Table of breakpoints per clone for OV2295 samples. Columns:

prediction_id: identifier for the breakpoint
chromosome_1: chromosome of breakend 1
strand_1: orientation of break end 1
position_1: position of break end 1
chromosome_2: chromosome of breakend 2
strand_2: orientation of break end 2
position_2: position of break end 2
clone_id: clone identifier
read_count: number of reads
is_present: presence=1, absent=0

ov2295_clone_clusters.csv.gz: Table of cell clusters as putative clones

cell_id: identifier for the cell
clone_id: clone identifier

ov2295_clone_cn.csv.gz: Table of allele specific copy number per clone for OV2295 samples. Columns:

chr: chromosome of bin
start: start of bin
end: end of bin
total_cn: HMMCopy predicted total copy number
minor_cn: HMM predicted minor copy number
major_cn: HMM predicted major copy number
clone_id: clone identifier

ov2295_clone_snvs.csv.gz: Table of SNVs per clone for OV2295 samples. Columns:

chrom: chromosome
coord: genome position
ref: reference nucleotide
alt: alternate nucleotide
clone_id: clone identifier
ref_counts: number of reads at this position matching the reference nucleotide
alt_counts: number of reads at this position matching the alternate nucleotide
total_counts: total number of reads at this position
is_present: presence=0, absent=1
is_het: is heterozygous
is_hom: is homozygous for the alternate

ov2295_nodes.csv.gz: Table of phylogenetic information for SNV evolution

variant_id: identifier for the SNV as chrom:coord:ref:alt
node: node in the phylogenetic tree
loss: probability the SNV was lost at this node
origin: probability the SNV originated at this node
presence: probability the SNV is present at this node
ml_origin: binary indicator the SNV originated at this node
ml_presence: binary indicator the SNV is present at this node
ml_loss: binary indicator the SNV was lost at this node

ov2295_snv_counts.csv.gz: Table of SNV counts

chrom: chromosome
coord: genome position
ref: reference nucleotide
alt: alternate nucleotide
ref_counts: number of reads at this position matching the reference nucleotide
alt_counts: number of reads at this position matching the alternate nucleotide
cell_id: identifier for the cell
total_counts: total number of reads at this position
sample_id: identifier for the sequenced sample

ov2295_tree.pickle: Phylogenetic tree in python pickle format. Requires installation of the stochastic dollo code at: https://bitbucket.org/dranew/dollo, version 0.4.2.

Note the following sample mapping: ‘SA922’: ‘OV2295(R2)’, ‘SA921’: ‘TOV2295(R)’, ‘SA1090’: ‘OV2295’,

Plots

ov_supp_clone_allele_cn.png: Clone allele ratios for each OV2295 sample.

ov_supp_clone_total_cn.png: Clone copy number for each OV2295 sample.

ov_supp_sample_total_cn.png: Bulk copy number for each OV2295 sample.

ov_supp_sample_allele_cn.png: Bulk allele ratios for each OV2295 sample.

Files

ov_supp_clone_allele_cn.png

Files (204.0 MB)

Name	Size	Download all
ov2295_breakpoint_counts.csv.gz md5:bb6d40b02dc36c5f0a2c0f81d9e70388	94.0 kB	Download
ov2295_cell_cn.csv.gz md5:e8d0a089e264d676f2b8aca62cc5382c	171.5 MB	Download
ov2295_cell_metrics.csv.gz md5:04c3c529a21ada1be2df08c3e037357b	491.9 kB	Download
ov2295_clone_alleles.csv.gz md5:9e8569f804bdbddb6ce4ef95b2c1c3a0	6.8 MB	Download
ov2295_clone_breakpoints.csv.gz md5:6b0f97be49fea56b344b5ea2021a1b4a	21.2 kB	Download
ov2295_clone_clusters.csv.gz md5:e37fa250d769c4d194af372624aba615	2.5 kB	Download
ov2295_clone_cn.csv.gz md5:aa20d8db8d3529a5264cf705248bfd93	705.2 kB	Download
ov2295_clone_snvs.csv.gz md5:4de3a2da63ec4d7150e4282264fa326e	640.6 kB	Download
ov2295_nodes.csv.gz md5:ce27682c3a1ddfb8caf525467ede8b25	6.9 MB	Download
ov2295_snv_counts.csv.gz md5:3d1ac0ab42cb8e84caabd9e20b356027	15.3 MB	Download
ov2295_tree.pickle md5:1ad7887a247243cf235011700e69e449	1.1 kB	Download
ov_supp_clone_allele_cn.png md5:9cba732f0609c5e0d35dd9ad8c99cf9c	621.0 kB	Preview Download
ov_supp_clone_total_cn.png md5:c071b873ff0eab17f8ac15178814058f	244.1 kB	Preview Download
ov_supp_sample_allele_cn.png md5:124247c49f9eb0c12c68dc1b65e5b463	551.7 kB	Preview Download
ov_supp_sample_total_cn.png md5:97f4403008fc60439345383b8ea85920	148.6 kB	Preview Download

Additional details

Is supplement to: 10.1101/411058 (DOI)

	All versions	This version
Views	2,333	1,686
Downloads	2,432	1,181
Data volume	71.7 GB	45.1 GB

Clonal decomposition and DNA replication states defined by scaled single cell genome sequencing

Authors/Creators

Description

Files

ov_supp_clone_allele_cn.png

Files (204.0 MB)

Additional details

Related works