Published August 11, 2024 | Version v1
Dataset Open

Comprehensive Structural Variant Benchmark Dataset: 1100 VCF files from long-read sequencing of 10 NCBI individuals

  • 1. Sun Yat-sen University

Description

We initially collected 10 NCBI individuals: HG002 family pedigree data (HG002 [son], HG003 [father], HG004 [mother]), the HG005 family pedigree data (HG005 [son], HG006 [father], HG007 [mother]), the NA12878 subject, the HG00096 subject, the HG00512 subject and the CHM13 subject. Then we used PacBio (CLR: Continuous Long Read, CCS: Circular Consensus Sequencing) and Nanopore (ONT) platforms, 5 aligners and 10 callers to construct the pipelines, with most parameters set to default values. After that, except for 6 invalid pipelines(pbmm2-Nanovar, lra-Picky, lra-delly, lra-NanoVar, lra-NanoSV, lra-pbsv), we obtain 1100 VCF files.

Files

1100VCF.zip

Files (15.2 GB)

Name Size Download all
md5:9fa003148eab0b7e8770cd02b7b03945
15.2 GB Preview Download