Published March 27, 2026 | Version Version v1.0.0

Undiagnosed rare disease cohort from Seoul National University Hospital (SNUH): v1.0.0

  • 1. Department of Genomic Medicine, Seoul National University Children's Hospital
  • 2. Department of Convergent Bioscience and Informatics, College of Bioscience and Biotechnology, Chungnam National University
  • 3. Department of Pediatrics, Seoul National University Children's Hospital

Description

This dataset contains VCF files generated from long-read sequencing analyses of 40 individuals with rare genetic diseases.

Variants were derived from pangenome graphs constructed using the minigraph-cactus pipeline (v2.9.2). Graph-based variants were obtained through deconstruction and subsequent vcfbub processing to produce standardized multisample VCF representations. Genomic coordinates are reported relative to the GENCODE release 46 reference (GRCh38.primary_assembly.genome.fa).

Two multisample VCFs are provided in this dataset.

The first VCF (SNUH.vcf.gz) includes 77 samples (1 CHM13 reference, 47 HPRC samples, and 29 rare disease samples) based on high-depth (30×) long-read assemblies from PacBio and Oxford Nanopore sequencing.

The second VCF (SNUH_lowDepth.vcf.gz) includes 12 samples (1 CHM13 reference and 11 rare disease samples) based on low-depth (10×) PacBio long-read assemblies.

 

Files

Files (2.7 GB)

Name Size
md5:aa2b690627b718d3bc27712860317328
2.1 GB Download
md5:182e28c5417ad0e38d8c55f749309a1a
2.2 MB Download
md5:59c9a73c50e8c668ee210ee1d3c5863b
594.9 MB Download
md5:2e0a8cc0d45ccc5cf66a318a5e434658
1.8 MB Download

Additional details

Related works

Is published in
Publication: 10.1101/2025.07.08.25330875 (DOI)

Funding

Seoul National University Hospital
SNUH Lee Kun-hee Child Cancer & Rare Disease Project Grant ID 22B-001-0100