Comparison and benchmark of structural variants detected from long read and long read assembly
- 1. Xi'an Jiaotong University
- 2. Leiden University
Description
Structural variant (SV) detection is essential for genomic studies and long-read sequencing technologies have advanced our capacity to detect SVs directly from read or de novo assembly, also known as read-based and assembly-based strategy. However, to date, no independent studies have compared and benchmarked the two strategies. Here, on the basis of SVs detected by 20 read-based and eight assembly-based detection pipelines from six datasets of HG002 genome, we investigated the factors that influence the two strategies and assessed their performance with well curated SVs. We found that up to 80% of the SVs could be detected by both strategies among different long read datasets, whereas variant type, size and breakpoint detected by read-based strategy was greatly affected by aligners. For the high-confident insertions and deletions at non-tandem repeat regions, a remarkable subset of them (82% in assembly-based calls and 93% in read-based calls), accounting for around 4,000 SVs, could be captured by both read and assembly, whereas the discordance was largely caused by complex SVs and inversions due to inconsistent alignment of read and assembly at these loci. Finally, benchmarking with SVs at medically relevant genes, the recall of read-based strategy reached 77% on 5X coverage data, whereas assembly-based strategy required 20X coverage data to achieve similar performance. Therefore, integrating SVs from read and assembly is suggested for general-purpose detection because of inconsistently detected complex SVs and inversions, while assembly-based strategy is optional for applications with limited resources.
Files
CMRGs.zip
Files
(11.2 GB)
Name | Size | Download all |
---|---|---|
md5:f27f48f53c16a9979c85d57d5a9c503c
|
377.3 kB | Preview Download |
md5:cf5cde92c55824ee076d35177144883b
|
879.6 MB | Preview Download |
md5:bddd476dbaede16770a33c445eafbfea
|
210.7 MB | Preview Download |
md5:54d8c17d70ed3f68e74ae4ccd67223c6
|
1.1 GB | Preview Download |
md5:c6409a3caf323b8951d591223c2fa276
|
1.3 GB | Preview Download |
md5:e6f6bfc8210fcd1978b53094cb02bc50
|
816.6 MB | Preview Download |
md5:5d53f37cd4ee73a4690a21a269152b21
|
2.1 GB | Preview Download |
md5:548c24d073b4cb42fe4f08d56705ea2c
|
1.3 GB | Preview Download |
md5:6d1c3b2ed854d45cac97d87bd12765bf
|
788.5 MB | Preview Download |
md5:9403702b312036bc4b929bc655046e64
|
2.5 GB | Preview Download |
md5:9683257e6e0ed647940864b9c7c8d626
|
79.8 MB | Preview Download |
md5:5885924173a6e26a029eff3c62d935ce
|
2.4 kB | Preview Download |