Undiagnosed rare disease cohort from Seoul National University Hospital (SNUH): v1.0.0
Authors/Creators
- 1. Department of Genomic Medicine, Seoul National University Children's Hospital
- 2. Department of Convergent Bioscience and Informatics, College of Bioscience and Biotechnology, Chungnam National University
- 3. Department of Pediatrics, Seoul National University Children's Hospital
Description
This dataset contains VCF files generated from long-read sequencing analyses of 40 individuals with rare genetic diseases.
Variants were derived from pangenome graphs constructed using the minigraph-cactus pipeline (v2.9.2). Graph-based variants were obtained through deconstruction and subsequent vcfbub processing to produce standardized multisample VCF representations. Genomic coordinates are reported relative to the GENCODE release 46 reference (GRCh38.primary_assembly.genome.fa).
Two multisample VCFs are provided in this dataset.
The first VCF (SNUH.vcf.gz) includes 77 samples (1 CHM13 reference, 47 HPRC samples, and 29 rare disease samples) based on high-depth (30×) long-read assemblies from PacBio and Oxford Nanopore sequencing.
The second VCF (SNUH_lowDepth.vcf.gz) includes 12 samples (1 CHM13 reference and 11 rare disease samples) based on low-depth (10×) PacBio long-read assemblies.
Files
Additional details
Related works
- Is published in
- Publication: 10.1101/2025.07.08.25330875 (DOI)
Funding
- Seoul National University Hospital
- SNUH Lee Kun-hee Child Cancer & Rare Disease Project Grant ID 22B-001-0100