An enhanced two-pass PCA workflow applied to rare disease RNA-Seq data reveals hidden structure and biologically relevant variation
Description
Below, this repository stores DESeq2-normalized gene expression count tables derived exclusively from RNA-seq data. The repository contains three independent datasets, corresponding to the three diseases studied. Each dataset is described below, focusing only on the RNA-seq count information.
sys_normalized_counts_DESeq2.txt
This file contains DESeq2-normalized RNA-seq gene expression counts obtained from human brain organoids used to study Schaaf–Yang syndrome (SYS), caused by truncating mutations in the MAGEL2 gene.
Rows correspond to genes annotated using Ensembl gene identifiers (e.g. ENSG00000227232), and columns correspond to individual RNA-seq samples. Each column name encodes the experimental metadata using the following structure:
S_<individual><replicate><time>_<organoid_type>
where:
-
individual identifies the donor (S_135 corresponds to the control individual and S_66 to the SYS patient),
-
replicate indicates the organoid batch,
-
time indicates the culture time point (30d, 60d, or 90d),
-
organoid_type indicates the organoid subtype, either human cortical spheroids (hCS) or human subpallial spheroids (hSS).
Sample labels are: S_135_5_30d_hCS, S_135_5_30d_hSS, S_135_5_60d_hCS, S_135_5_60d_hSS, S_135_5_90d_hCS, S_135_5_90d_hSS, S_66_1_30d_hCS, S_66_1_30d_hSS, S_66_1_60d_hCS, S_66_1_60d_hSS, S_66_1_90d_hCS, S_66_1_90d_hSS.
The values in the table represent normalized gene expression counts produced by DESeq2 from paired-end RNA-seq data (150 bp reads, NovaSeq platform), with an average sequencing depth of approximately 100 million reads per sample.
lafora_normalized_counts_DESeq2.txt
This file contains DESeq2-normalized RNA-seq gene expression counts obtained from a mouse model of Lafora disease, a neurodegenerative disorder caused by mutations in the EPM2A or EPM2B genes.
Rows correspond to genes annotated using Ensembl mouse gene identifiers (e.g. ENSMUSG00000051951), and columns correspond to individual RNA-seq samples. Each column name encodes the experimental group and biological replicate using the following structure:
CTL_<replicate>
epm2a_<replicate>
epm2b_<replicate>
where:
-
CTL indicates wild-type control mice,
-
epm2a indicates Epm2a knockout mice (Epm2a-/-),
-
epm2b indicates Epm2b knockout mice (Epm2b-/-),
-
replicate indicates the individual biological replicate.
Sample labels are: CTL_1, CTL_2, CTL_3, CTL_4, epm2a_1, epm2a_2, epm2a_3, epm2b_1, epm2b_2, epm2b_3, epm2b_4.
The dataset includes RNA-seq data from four wild-type control mice and seven knockout mutant mice. The values in the table represent normalized gene expression counts produced by DESeq2 from RNA-seq data, and were used for differential expression analysis comparing control and mutant animals.
pmm2_normalized_counts_DESeq2.txt
This file contains DESeq2-normalized RNA-seq gene expression counts obtained from skin fibroblast cell lines of patients affected by PMM2 congenital disorder of glycosylation (PMM2-CDG), caused by loss-of-function mutations in the PMM2 gene.
Rows correspond to genes annotated using Ensembl human gene identifiers (e.g. ENSG00000279457), and columns correspond to individual RNA-seq samples. Each column name encodes the patient identifier, disease severity group, and biological replicate using the following structure:
P<patient><severity><replicate>
where:
-
patient identifies the individual patient,
-
severity indicates disease severity classified as HIGH or LOW,
-
replicate indicates the biological replicate.
The sample labels are: P10_HIGH_4, P11_HIGH_5, P12_HIGH_6, P8_HIGH_2, P9_HIGH_3, P1_LOW_1, P2_LOW_2, P3_LOW_3, P4_LOW_4, P5_LOW_5.
The dataset includes RNA-seq data from 10 PMM2-CDG patient-derived fibroblast cell lines, with five samples classified as high-severity and five as low-severity. The values in the table represent normalized gene expression counts produced by DESeq2 from RNA-seq data and were used for differential expression analysis comparing low-severity (control) and high-severity (treat) patient groups.