Published April 23, 2023 | Version v1
Journal article Open

Comparison and benchmark of structural variants detected from long read and long read assembly

  • 1. Xi'an Jiaotong University
  • 2. Leiden University

Description

Structural variant (SV) detection is essential for genomic studies and long-read sequencing technologies have advanced our capacity to detect SVs directly from read or de novo assembly, also known as read-based and assembly-based strategy. However, to date, no independent studies have compared and benchmarked the two strategies. Here, on the basis of SVs detected by 20 read-based and eight assembly-based detection pipelines from six datasets of HG002 genome, we investigated the factors that influence the two strategies and assessed their performance with well curated SVs. We found that up to 80% of the SVs could be detected by both strategies among different long read datasets, whereas variant type, size and breakpoint detected by read-based strategy was greatly affected by aligners. For the high-confident insertions and deletions at non-tandem repeat regions, a remarkable subset of them (82% in assembly-based calls and 93% in read-based calls), accounting for around 4,000 SVs, could be captured by both read and assembly, whereas the discordance was largely caused by complex SVs and inversions due to inconsistent alignment of read and assembly at these loci. Finally, benchmarking with SVs at medically relevant genes, the recall of read-based strategy reached 77% on 5X coverage data, whereas assembly-based strategy required 20X coverage data to achieve similar performance. Therefore, integrating SVs from read and assembly is suggested for general-purpose detection because of inconsistently detected complex SVs and inversions, while assembly-based strategy is optional for applications with limited resources.

Files

CMRGs.zip

Files (11.2 GB)

Name Size Download all
md5:f27f48f53c16a9979c85d57d5a9c503c
377.3 kB Preview Download
md5:cf5cde92c55824ee076d35177144883b
879.6 MB Preview Download
md5:bddd476dbaede16770a33c445eafbfea
210.7 MB Preview Download
md5:54d8c17d70ed3f68e74ae4ccd67223c6
1.1 GB Preview Download
md5:c6409a3caf323b8951d591223c2fa276
1.3 GB Preview Download
md5:e6f6bfc8210fcd1978b53094cb02bc50
816.6 MB Preview Download
md5:5d53f37cd4ee73a4690a21a269152b21
2.1 GB Preview Download
md5:548c24d073b4cb42fe4f08d56705ea2c
1.3 GB Preview Download
md5:6d1c3b2ed854d45cac97d87bd12765bf
788.5 MB Preview Download
md5:9403702b312036bc4b929bc655046e64
2.5 GB Preview Download
md5:9683257e6e0ed647940864b9c7c8d626
79.8 MB Preview Download
md5:5885924173a6e26a029eff3c62d935ce
2.4 kB Preview Download