Published May 28, 2024 | Version v1
Dataset Open

Dataset related to the article "Binary classification of copy number alteration profiles in liquid biopsy with potential clinical impact in advanced NSCLC"

  • 1. Veneto Institute of Oncology - IRCCS

Description

This record contains original data used in the article "Binary classification of copy number alteration profiles in liquid biopsy with potential clinical impact in advanced NSCLC" to develop a linear support vector machine (SVM) classifier to predict chromosomal instability.  

We retrospectively evaluated the results of plasma NGS analysis performed at our Institution by using the AVENIO ctDNA Expanded Kit, a panel of 77 genes, which detects the major classes of genetic alterations. Binary classification, into “stable” (SCP) or “unstable” (UCP) chromosomal profiles, was initially performed by visual inspection of individual CNV alteration profiles by two independent professionals of our group. Then we decided to implement a support vector machine (SVM) classifier to automatically classify CNV profiles as SCP or UCP, beyond operators’ experience. We considered the segmented log2 ratios (.cns) files provided by the CNV kit software and computed three features (Segments, Size, Chromosomes).  An alteration (“occurrence of instability”) in the CNV profile was defined each time we found a DNA segment of any size with absolute value of the log2 copy ratio exceeding a fixed cut-off. Two different cut-off values on log2 copy ratio were examined: 0.1 and 0.2. Once the cut-off was defined, three features were considered as covariates in the SVM classifier: 1) number of altered segments (Segments), 2) total length of altered regions (Size) and 3) number of affected chromosomes (Chromosomes).

The “dataset_0.1.txt” and “dataset_0.2.txt” files are the original data matrices obtained by considering a cut-off of 0.1 and 0.2, respectively, on the absolute value of the log2 copy ratio.

Rows represent available samples in our study (n=177). Columns contain the following variables: anonymized sample IDs (Sample), the class, “stable” or “unstable”, as assigned by two independent professionals of our group (Class), the corresponding binary label (Label: 0 for “stable”, 1 for “unstable”), the three features used as covariates in the SVM classifier and computed as described above (Segments, Size, Chromosomes).


For the detailed results of our work, please refer to the full article.

Files

dataset_0.1.txt

Files (9.9 kB)

Name Size Download all
md5:54cf05c339bede2481c5a3badb1267bf
5.0 kB Preview Download
md5:f3ac510fdf9a4381abc3b08455386ae5
5.0 kB Preview Download

Additional details

Related works

Is cited by
Journal article: 10.1038/s41598-024-68229-6 (DOI)