ExactCN: Predicting Exact Copy Numbers on Whole Exome Sequencing Data
Contributors
Researcher (4):
Supervisor:
Description
The quantification of the precise copy number variations (CNVs) is crucial to understanding
the effects of gene dosage, disease severity, and therapeutic response. Although whole-exome sequencing
(WES) offers a cost-effective solution for CNV detection in a clinical setting, it introduces several biases,
including those related to sequence length, GC content, and the use of targeting probes. Consequently,
estimating exact copy numbers remains challenging, especially for WES data. Here, we present Ex-
actCN, a deep learning–based method for estimation of exact copy numbers from WES data per exon.
The architecture integrates convolutional layers that extract local read-depth patterns with transformer
encoder blocks that capture genomic context and handle sequencing noise. ExactCN is trained on WES
samples from the 1000 Genomes Project, using matching WGS-based calls as semi–ground truth. In
benchmarks, ExactCN improves the state-of-the-art integer CNV calling performance by reducing the
macro-averaged mean absolute error (MAE) from 0.91 to 0.62 and the macro-averaged root mean
squared error (RMSE) from 1.31 to 0.78. It also achieves an overall Pearson correlation of 0.669 and
Spearman correlation of 0.550, improving the second-best method by 0.641 and 0.482, respectively.
Furthermore, a fine-tuned version of ExactCN demonstrated an overall F1-score of 0.657 for ag-
gregate CNV detection performance on the clinically important duplicated genes SMN1/2, demon-
strating its applicability to both research and clinical genomic analyses.
Files
Groundtruth_HG00273.csv
Files
(243 Bytes)
| Name | Size | Download all |
|---|---|---|
|
md5:48cf07bbdc0c66c7357d4a91b438fe50
|
243 Bytes | Preview Download |