Published April 16, 2026 | Version v4
Journal article Open

Analysis pipeline and code for "Pathogenic germline variations and cancer risk in pediatric patients"

  • 1. ROR icon Children's Hospital of Fudan University

Description

project/
├── data/                    # 数据文件目录 / Data files directory
├── src/                     # 源代码目录 / Source code directory
│   └── functions.R          # 自定义函数 / Custom functions
├── result/                  # 结果输出目录 / Results output directory
└── analysis_NBS.R         # 主分析脚本 / Main analysis script (this file)

Overview

This project analyzes the association between genetic variants (SNV and CNV) and clinical phenotypes in tumor samples, with a focus on evaluating the impact of pathogenic variants on tumor development risk. The analysis employs survival analysis, competing risk models, and descriptive statistics to assess how different variant classifications influence tumor outcomes.

Key Features

  • Comprehensive variant analysis: Includes both SNV and CNV mutations classified as PLP (Pathogenic/Likely Pathogenic), VUS-LP (Variant of Uncertain Significance/Likely Pathogenic), and other variants

  • Clinical correlation: Integrates clinical follow-up data to assess tumor development risk

  • Incidence rate calculation: Computes tumor incidence rates per 1000 person-years

Analysis Pipeline

  1. Data preprocessing: Loading and integrating sample information, mutation data, and clinical records

  2. Descriptive statistics: Generating demographic and clinical characteristic tables

  3. Distribution analysis: Examining tumor type distribution across cohorts and variant groups

  4. Survival analysis:

    • Kaplan-Meier curves for tumor-free survival

    • Competing risk models distinguishing between benign and malignant tumors

  5. Incidence calculation: Computing and comparing tumor incidence rates

Output

  • Tables: Descriptive statistics of sample characteristics

  • Figures: Survival curves and cumulative incidence plots

  • Incidence rates: Tumor occurrence rates per 1000 person-years by variant group

Requirements

  • R (≥4.0.0)

  • R packages: survival, survminer, dplyr, table1, openxlsx, gtsummary, cmprsk, tidycmprsk, ggsurvfit, RColorBrewer, ggpubr

Usage

  1. Place input data files in the data/ directory

  2. Run main_analysis.R to execute the complete analysis pipeline

  3. Find outputs in the result/ directory

Notes

  • Variant classification follows priority: PLP > VUS-LP > other

  • Incident cohort samples with follow-up information are used for survival analysis

  • Competing risk models account for both benign and malignant tumor events

Files

paper_code.zip

Files (5.5 kB)

Name Size Download all
md5:963760ec3608d2391306a0e577bb5ec2
5.5 kB Preview Download