Generate patients profiles from BRCA TCGA data for further logical modelling.
More than 800 patients with several kinds of omics data: exome-sequencing, Copy Number Alterations (CNA), RNA, proteomics and clinical annotations.
Most patients have all omics data:
## NULL
We investigate relations betwenn RNAseq data and BC subtypes. Subtypes have been defined based on PAM50 method. First, here is the distribution of BC subtypes across the cohort:
RNAseq is projected on PC1/PC2 space (from Principal Component Analysis), either from all genes, or from PAM50 genes only (using only 47 out of 50 genes in PAM50 list present in RNAseq)
We need to assign Boolean effects to mutations: either 0 (inactivating) or 1 activating. A mutation can stay unassigned in absence of any evidence.
Assignement methods and their respective influence:
## [1] "Sankey plots of mutation assignments depending on methods used"
## [1] "Sankey plots of mutation assignments depending on methods used (restricted to model-related nodes)"
Now we can summarize patient mutation profiles after processing. In following plots we focus on model-related genes only.
For CNA, we have decided to focus on stringent amplifications/deletions corresponding to +2/-2 GISTIC results. We produce the same kind of plots.
RNA data is intrinsically continuous and therefore require preliminary data processing. It is important to notice that TCGA data is RNAseq (contrary to METABRIC data coming from microarray).
## ===========================================================================
## ===========================================================================
Now, what about the distribution of gene categories (Bimodal, Unimodal…) across the cohort?
## [1] "TCGA assignments:"
Bimodal | Discarded | Unimodal | ZeroInf |
---|---|---|---|
131 | 1083 | 16343 | 2883 |
## [1] "TCGA_ref assignments:"
Bimodal | Discarded | Unimodal | ZeroInf |
---|---|---|---|
158 | 1196 | 17122 | 1964 |
## [1] "Consistency between both TCGA sub-cohorts:"
Bimodal | Discarded | Unimodal | ZeroInf | |
---|---|---|---|---|
Bimodal | 11 | 0 | 95 | 25 |
Discarded | 0 | 974 | 0 | 109 |
Unimodal | 25 | 0 | 16241 | 77 |
ZeroInf | 122 | 222 | 786 | 1753 |
## [1] "Common bimodal genes:"
## Gene1 Gene2 Gene3 Gene4
## "FGF7P6" "GSTM1" "OBP2B" "RPL9"
## Gene5 Gene6 Gene7 Gene8
## "RPS27" "RPS28" "SERPINA6" "TMEM189-UBE2V1"
## Gene9 Gene10 Gene11
## "TSNAX-DISC1" "TYW1B" "ZFP91-CNTF"
## [1] "TCGA assignments for model-related nodes:"
Unimodal | ZeroInf |
---|---|
110 | 3 |
Here are some distributions plots randomly picked in each category in TCGA cohort
And depending on distribution category, we can perform binarization
## [1] "Bimodal example:"
## [1] "Unimodal example:"
## [1] "ZeroInf example:"
## [1] "Some possible differences between methods w/ and w/o reference"
## ===========================================================================
Before merging independant profiles into multi-omics profiles, let’s have a look at relations between data types
## [1] "PDK1 is the anti-correlated gene"
In particular, is there any mutation/CNA binary inconsistency?
Patient | Gene | Mut | CNA |
---|---|---|---|
TCGA-AR-A24Q-01 | CASP8 | 0 | 1 |
TCGA-A8-A07R-01 | NF1 | 0 | 1 |
TCGA-EW-A3U0-01 | TP53 | 0 | 1 |
TCGA-BH-A1FN-01 | PTEN | 0 | 1 |
In case of ambiguity, pritority is given to mutations over CNA
Patient | Gene | CNA | RNA |
---|---|---|---|
TCGA-A2-A0D0-01 | COX4I2 | 1 | 0 |
TCGA-GI-A2C8-01 | CCNB2 | 1 | 0 |
TCGA-A2-A04P-01 | CCND1 | 1 | 0 |
TCGA-D8-A13Y-01 | CCND1 | 1 | 0 |
TCGA-A1-A0SK-01 | CCND2 | 1 | 0 |
TCGA-A8-A08L-01 | MYC | 1 | 0 |
TCGA-C8-A138-01 | MYC | 1 | 0 |
TCGA-AQ-A54N-01 | NF1 | 1 | 0 |
TCGA-A2-A0CW-01 | PRKCA | 1 | 0 |
TCGA-A2-A0CW-01 | RAF1 | 1 | 0 |
TCGA-E9-A1RH-01 | RAG2 | 0 | 1 |
TCGA-AN-A0AL-01 | ERBB4 | 1 | 0 |
TCGA-JL-A3YW-01 | ERBB4 | 1 | 0 |
TCGA-BH-A0B9-01 | FLT1 | 1 | 0 |
TCGA-EW-A1PB-01 | FLT4 | 1 | 0 |
TCGA-A8-A08B-01 | KDR | 1 | 0 |
TCGA-A8-A06N-01 | SNAI2 | 1 | 0 |
TCGA-A8-A09K-01 | SNAI2 | 1 | 0 |
TCGA-AN-A0XU-01 | RB1 | 1 | 0 |
TCGA-BH-A0B3-01 | RB1 | 1 | 0 |
In case of ambiguity, pritority is given to RNA over CNA
When available, prot is used. Otherwise, we use RNA
Patient | Gene | RNA | Prot |
---|---|---|---|
TCGA-A2-A0CQ-01 | PRKAA1 | 0 | 1 |
TCGA-E2-A1LG-01 | PRKAA1 | 1 | 0 |
TCGA-EW-A1PA-01 | BAX | 0 | 1 |
TCGA-D8-A1XK-01 | CTNNB1 | 1 | 0 |
TCGA-E2-A1LG-01 | CTNNB1 | 1 | 0 |
TCGA-BH-A0HY-01 | DVL3 | 1 | 0 |
TCGA-A2-A0EX-01 | EEF2 | 1 | 0 |
TCGA-AR-A2LE-01 | EEF2 | 1 | 0 |
TCGA-C8-A1HN-01 | EEF2 | 0 | 1 |
TCGA-BH-A1ES-06 | GSK3A | 1 | 0 |
TCGA-E9-A3X8-01 | MAPK8 | 0 | 1 |
TCGA-A2-A3XU-01 | CDKN1A | 1 | 0 |
TCGA-A2-A259-01 | TP53 | 1 | 0 |
TCGA-BH-A1ES-06 | TP53 | 0 | 1 |
TCGA-B6-A0I2-01 | RPS6KB1 | 0 | 1 |
TCGA-D8-A1X5-01 | RPS6KA1 | 0 | 1 |
TCGA-A1-A0SK-01 | PDK1 | 1 | 0 |
TCGA-A7-A13D-01 | PDK1 | 1 | 0 |
TCGA-AN-A0FZ-01 | PDK1 | 0 | 1 |
TCGA-JL-A3YX-01 | PIK3CA | 0 | 1 |
TCGA-LL-A5YM-01 | PIK3CA | 0 | 1 |
TCGA-EW-A1PB-01 | PTEN | 0 | 1 |
TCGA-LL-A5YL-01 | RAF1 | 0 | 1 |
TCGA-D8-A1XK-01 | BRAF | 1 | 0 |
TCGA-A2-A1FZ-01 | NRAS | 0 | 1 |
TCGA-AC-A3TN-01 | NRAS | 0 | 1 |
TCGA-E9-A1RH-01 | NRAS | 0 | 1 |
TCGA-E2-A14O-01 | KDR | 0 | 1 |
TCGA-E9-A5UP-01 | KDR | 0 | 1 |
TCGA-E2-A10C-01 | SMAD4 | 0 | 1 |
TCGA-E9-A1R6-01 | SMAD4 | 0 | 1 |
TCGA-E2-A1LG-01 | SNAI1 | 0 | 1 |
TCGA-OL-A66I-01 | SNAI1 | 1 | 0 |
TCGA-D8-A143-01 | RB1 | 0 | 1 |
Patient | Gene | Mut | Exp |
---|---|---|---|
TCGA-EW-A1P3-01 | CASP8 | 0 | 1 |
TCGA-EW-A1OX-01 | CHEK1 | 0 | 1 |
TCGA-AR-A1AI-01 | CCNB3 | 0 | 1 |
TCGA-D8-A1Y0-01 | MTOR | 1 | 0 |
TCGA-AR-A2LE-01 | CDKN1A | 0 | 1 |
TCGA-A1-A0SK-01 | TP53 | 0 | 1 |
TCGA-A2-A04T-01 | TP53 | 0 | 1 |
TCGA-A2-A3XT-01 | TP53 | 0 | 1 |
TCGA-A7-A4SE-01 | TP53 | 0 | 1 |
TCGA-A8-A07R-01 | TP53 | 0 | 1 |
TCGA-A8-A08X-01 | TP53 | 0 | 1 |
TCGA-AN-A0FY-01 | TP53 | 0 | 1 |
TCGA-AQ-A54N-01 | TP53 | 0 | 1 |
TCGA-AR-A1AJ-01 | TP53 | 0 | 1 |
TCGA-AR-A1AW-01 | TP53 | 0 | 1 |
TCGA-B6-A0WX-01 | TP53 | 0 | 1 |
TCGA-BH-A0AV-01 | TP53 | 0 | 1 |
TCGA-BH-A0B9-01 | TP53 | 0 | 1 |
TCGA-BH-A0E0-01 | TP53 | 0 | 1 |
TCGA-BH-A203-01 | TP53 | 0 | 1 |
TCGA-C8-A12L-01 | TP53 | 0 | 1 |
TCGA-C8-A12Z-01 | TP53 | 0 | 1 |
TCGA-C8-A1HM-01 | TP53 | 0 | 1 |
TCGA-C8-A26W-01 | TP53 | 0 | 1 |
TCGA-C8-A26Y-01 | TP53 | 0 | 1 |
TCGA-C8-A27B-01 | TP53 | 0 | 1 |
TCGA-D8-A1JJ-01 | TP53 | 0 | 1 |
TCGA-D8-A1XT-01 | TP53 | 0 | 1 |
TCGA-E2-A1AZ-01 | TP53 | 0 | 1 |
TCGA-E2-A1L7-01 | TP53 | 0 | 1 |
TCGA-E2-A2P5-01 | TP53 | 0 | 1 |
TCGA-E2-A574-01 | TP53 | 0 | 1 |
TCGA-E9-A244-01 | TP53 | 0 | 1 |
TCGA-EW-A1PC-01 | TP53 | 0 | 1 |
TCGA-GI-A2C9-01 | TP53 | 0 | 1 |
TCGA-LL-A441-01 | TP53 | 0 | 1 |
TCGA-LL-A5YP-01 | TP53 | 0 | 1 |
TCGA-A2-A0T4-01 | PIK3CA | 1 | 0 |
TCGA-A2-A0YC-01 | PIK3CA | 1 | 0 |
TCGA-A8-A095-01 | PIK3CA | 1 | 0 |
TCGA-BH-A0EE-01 | PIK3CA | 1 | 0 |
TCGA-C8-A12N-01 | PIK3CA | 1 | 0 |
TCGA-C8-A131-01 | PIK3CA | 1 | 0 |
TCGA-C8-A1HF-01 | PIK3CA | 1 | 0 |
TCGA-C8-A274-01 | PIK3CA | 1 | 0 |
TCGA-D8-A1JJ-01 | PIK3CA | 1 | 0 |
TCGA-D8-A1JK-01 | PIK3CA | 1 | 0 |
TCGA-E2-A10C-01 | PIK3CA | 1 | 0 |
TCGA-E2-A15K-01 | PIK3CA | 1 | 0 |
TCGA-E9-A1R4-01 | PIK3CA | 1 | 0 |
TCGA-E9-A1RC-01 | PIK3CA | 1 | 0 |
TCGA-EW-A1PE-01 | PIK3CA | 1 | 0 |
TCGA-BH-A18G-01 | PRKCA | 0 | 1 |
TCGA-A7-A26G-01 | PTEN | 0 | 1 |
TCGA-A8-A06U-01 | RHEB | 1 | 0 |
TCGA-D8-A143-01 | RB1 | 0 | 1 |