cfDNA methylome profiling for detection and subtyping of Small Cell Lung Cancers
Description
Methyl-Binding Domain protein sequencing (MBD-Seq) was applied to samples derived from patients with small cell lung cancer (SCLC), as well as non-cancerous controls. This included circulating tumour cell derived explants (CDX) or patient derived Xenograft (PDX) preclinical models derived from 33 patients with SCLC, circulating cell-free DNA (cfDNA) from 78 patients with SCLC, cfDNA from 79 non-cancer controls and 13 non-cancerous lung tissue samples.
The objects deposited here include R data files containing qseaSets from the R package qsea, which includes the read counts per sample per 300 base pair window across the genome, as well as information on copy number variation and metadata tables and the scripts used to generate and analyse them.
Details of files:
DX_All_min50_max1000_w300_q10.rds
A qseaSet containing all 97 CDX/PDX samples (including replicates) and the 13 normal lung tissue samples. Note that min50_max1000_w300_q10 refers to including paired reads with between 50 and 1000 base pairs (bp), a window size of 300bp and a minimum MAPQ score of 10.
DX_merged.rds
A qseaSet with the biological replicates of each CDX merged together (formed from the above dataset).
cfDNA_All_min50_max1000_w300_q10.rds
A qseaSet containing 157 cfDNA samples used in the main body of the paper.
NCCsForTrain_All_min50_max1000_w300_q10.rds
A qseaSet containing the 38 NCC cfDNA samples used in the mixture sets for training the tumour/normal classifier. These samples are a subset of those in the cfDNA_All_min50_max1000_w300_q10.rds object.
ValidationSet_All_min50_max1000_w300_q10.rds
A qseaSet containing the 41 NCC cfDNA samples and 78 SCLC cfDNA samples used to validate the tumour/normal classifier. A subset of the cfDNA_All_min50_max1000_w300_q10.rds object.
cfDNApostTreatment_All_min50_max1000_w300_q10.rds
A qseaSet containing 7 cfDNA samples which were collected at a later, post-treatment, timepoint (mostly disease progression) from the same patients as in the main cfDNA object. Used only for Extended Figure 5, not any other part of the manuscript.
varyDNAinput_All_min50_max1000_w300_q10.rds
A qseaSet containing independent replicates of the cell line H1975, with different ng amounts of starting DNA (1-75ng). Used only for Figure 1B.
TNmixSets_combined_regionsFiltered_redo2.rds
A qseaSet containing the synthetic mixture sets generated by mixing either a CDX/PDX sample and a NCC sample or two NCC samples. Used to train the tumour/normal classifier. This object is restricted to only the windows used in the classifier for size regions, but the original mixtures are across the whole genome.
CDXarrayWide.csv
Pre-processed 450k Infinium Methylation array beta values for 8 CDX samples which were previously sequenced. Used for Supplementary Table 7 only.
ArrayCDXs_percent100.rds
A qseaSet containing the 8 CDX samples processed on 450k Infinium Methylation arrays (CDXarrayWide.csv) converted to estimated reads.
KeyTFsincYAP.csv
Variance stabilised transform (vst) values generated from RNASeq for the CDX/PDX samples, for the key genes involved in the subtype classifications.
infinium-methylationepic-v-1-0-b5-manifest-file.csv
A lookup file for the Infinium EPIC arrays, as downloaded from https://emea.support.illumina.com/downloads/infinium-methylationepic-v1-0-product-files.html.
SCLC_transcript_expression_adjusted_for_batch_effects.csv
SCLC_methylation_beta_values_of_individual_probes_after_QC_and_filtering_out_SNVs.csv
Pre-processed transcript and methylation beta values from Infinium EPIC arrays for SCLC cell lines, as downloaded from sclccelllines.cancer.gov/sclc/downloads.xhtml (data timestamped as December 2019).
CellLine_100percent.rds
A qseaSet containing the SCLC cell lines converted to estimated MBD-Seq reads.
Cellline_mixtureSets.rds
A qseaSet containing the synthetic mixture sets generated by mixing converted cell lines with a NCC sample. Used to train the subtype classifier.
DilutionSeries_CDX13_min50_max1000_w300_q10.rds
DilutionSeries_CDX29_min50_max1000_w300_q10.rds
DilutionSeries_CDX32_min50_max1000_w300_q10.rds
Three qseaSets containing the results of an in silico dilution of a CDX (CDX13 = POU2F3, CDX29 = NEUROD1, CDX32 = ASCL1) with a single NCC, used to test limit of detection of the subtype classifier. Reads were mixed at the fastq level, prior to the NextFlow pipeline being used.
DilutionSeries_H446_Rep*_min50_max1000_w300_q10.rds
Eleven qseaSets containing the results of an in silico dilution of a SCLC cell line H446 with a single NCC (not used to build the classifier), used to determine limit of detection of the tumour/normal classifier. Reads were mixed at the fastq level with different random seeds, prior to a NextFlow pipeline being used.
poirier_oncogene.rda
PoirierEtAl_Oncogene2015_SuppTab1.csv
PoirierEtAl_Oncogene2015_SuppTab2.csv
Processed data object for the 2015 Oncogene paper Poirier et al (PMID:25746006), with 450k array data for SCLC tumours and normal lungs, along with two of the supplementary tables from that paper.