In-house WGS Datasets (VCF) Generated for GenRiskPro Project 2025

Song, Xiya

doi:10.5281/zenodo.16409257

Published July 24, 2025 | Version v1

Dataset Open

In-house WGS Datasets (VCF) Generated for GenRiskPro Project 2025

Song, Xiya (Researcher)¹

1. KTH Royal Institute of Technology

A total of 275 TR individuals (154 males, 121 females) from 180 distinct families were recruited between January 2022 and June 2023 in the first phase of the Anatolian Precision Medicine Initiative (APMI).

The goal of this study is to investigate genetic variation in the Turkish population through whole-genome sequencing data. By analyzing data from healthy individuals and patients with various diseases, the study aims to improve understanding of disease risk, diagnosis, and treatment at the individual level. This research is part of the Anatolian Precision Medicine Initiative and supports the development of precision medicine in Turkey. The findings will contribute to global reference data and help identify population-specific genetic patterns that are important for public health and clinical care.

Blood samples were collected from all 275 individuals to perform WGS analysis. A 5-ml venous EDTA blood samples were obtained from each participant and transferred to a sequencing laboratory using a transport box chilled (4–10 °C) with ice packs. To purify total genomic DNA from human whole blood samples, an automated extractor, QIASYMPHONY (Qiagen, Germany), was used. The manual extraction of nucleic acids was conducted for the samples for which the isolation kits were not compatible with QIASYMPHONY. The quality and quantity of DNA samples were initially assessed spectrophotometrically with NanoDrop (Thermo Fisher, USA). Before the library prep, gDNA samples were quantified more precisely fluorometrically with the Qubit Broad Range dsDNA quantitation kit (Waltham, MA, USA). From the gDNA samples with sufficient quality and quantity, PCR-free NGS libraries were constructed with the Illumina DNA PCR-free Prep kit with the input of 300-400 ng. The sequencing was performed on the Illumina NovaSeq 6000 platform. Raw sequence data in Binary Base Call (BCL) format was demultiplexed and converted to FASTQ with DRAGEN Software v3.9.5. Paired-end reads were then aligned to the NCBI reference sequence (GRCh38) by using the Burrows-Wheeler Aligner (BWA). The variant calling was performed by the Germline Small Variant Caller from the Dragen server V3 (Illumina) with the default hard filter applied. The MultiQC reports were generated to evaluate the sequencing quality.

In this prelimery repo, we included the first 100 (1-100) samples from the projects, with the full tabular outputs from the pipeline, including genetic risk (for major rare and complex diseases, mapped using ClinVar and in-sillico predictions scores), PGx, GWAS and traits. The full dataset are available upon request. The full dataset will be soon avaialbe under The European Genome-phenome Archive (EGA).

Files

1-100.zip

Files (45.1 GB)

Name	Size	Download all
1-100.zip md5:cea1cc6a2f7347ca41d035e5e4960af6	44.4 GB	Preview Download
python_results_SW_101_2025.zip md5:5b99492439844fd6470b9ce2b4416360	203.2 MB	Preview Download
python_results_TR_275_2025.zip md5:5b32eab7a316c5731fdc22949ab08cb7	570.6 MB	Preview Download

	All versions	This version
Views	88	88
Downloads	44	44
Data volume	807.7 GB	807.7 GB

In-house WGS Datasets (VCF) Generated for GenRiskPro Project 2025

Authors/Creators

Description

Files

1-100.zip

Files (45.1 GB)