Objectives

Generate patients profiles from BRCA TCGA data for further logical modelling.

Independant omics profiles

TCGA dataset recap

More than 800 patients with several kinds of omics data: exome-sequencing, Copy Number Alterations (CNA), RNA, proteomics and clinical annotations.

Most patients have all omics data:

## NULL

We investigate relations betwenn RNAseq data and BC subtypes. Subtypes have been defined based on PAM50 method. First, here is the distribution of BC subtypes across the cohort:

RNAseq is projected on PC1/PC2 space (from Principal Component Analysis), either from all genes, or from PAM50 genes only (using only 47 out of 50 genes in PAM50 list present in RNAseq)

Processing pipeline

Mutations profiles

We need to assign Boolean effects to mutations: either 0 (inactivating) or 1 activating. A mutation can stay unassigned in absence of any evidence.

Assignement methods and their respective influence:

## [1] "Sankey plots of mutation assignments depending on methods used"
## [1] "Sankey plots of mutation assignments depending on methods used (restricted to model-related nodes)"

Now we can summarize patient mutation profiles after processing. In following plots we focus on model-related genes only.

CNA profiles

For CNA, we have decided to focus on stringent amplifications/deletions corresponding to +2/-2 GISTIC results. We produce the same kind of plots.

RNA profiles

RNA data is intrinsically continuous and therefore require preliminary data processing. It is important to notice that TCGA data is RNAseq (contrary to METABRIC data coming from microarray).

Binarization with classification tree

## ===========================================================================
## ===========================================================================

Now, what about the distribution of gene categories (Bimodal, Unimodal…) across the cohort?

## [1] "TCGA assignments:"
Bimodal Discarded Unimodal ZeroInf
131 1083 16343 2883
## [1] "TCGA_ref assignments:"
Bimodal Discarded Unimodal ZeroInf
158 1196 17122 1964
## [1] "Consistency between both TCGA sub-cohorts:"
Bimodal Discarded Unimodal ZeroInf
Bimodal 11 0 95 25
Discarded 0 974 0 109
Unimodal 25 0 16241 77
ZeroInf 122 222 786 1753
## [1] "Common bimodal genes:"
##            Gene1            Gene2            Gene3            Gene4 
##         "FGF7P6"          "GSTM1"          "OBP2B"           "RPL9" 
##            Gene5            Gene6            Gene7            Gene8 
##          "RPS27"          "RPS28"       "SERPINA6" "TMEM189-UBE2V1" 
##            Gene9           Gene10           Gene11 
##    "TSNAX-DISC1"          "TYW1B"     "ZFP91-CNTF"
## [1] "TCGA assignments for model-related nodes:"
Unimodal ZeroInf
110 3

Here are some distributions plots randomly picked in each category in TCGA cohort

And depending on distribution category, we can perform binarization

## [1] "Bimodal example:"

## [1] "Unimodal example:"

## [1] "ZeroInf example:"

## [1] "Some possible differences between methods w/ and w/o reference"

## ===========================================================================

Normalization

Merged profiles

Data types relations

Before merging independant profiles into multi-omics profiles, let’s have a look at relations between data types

## [1] "PDK1 is the anti-correlated gene"

Mutations and CNA

In particular, is there any mutation/CNA binary inconsistency?

Patient Gene Mut CNA
TCGA-AR-A24Q-01 CASP8 0 1
TCGA-A8-A07R-01 NF1 0 1
TCGA-EW-A3U0-01 TP53 0 1
TCGA-BH-A1FN-01 PTEN 0 1

In case of ambiguity, pritority is given to mutations over CNA

CNA and RNA

Patient Gene CNA RNA
TCGA-A2-A0D0-01 COX4I2 1 0
TCGA-GI-A2C8-01 CCNB2 1 0
TCGA-A2-A04P-01 CCND1 1 0
TCGA-D8-A13Y-01 CCND1 1 0
TCGA-A1-A0SK-01 CCND2 1 0
TCGA-A8-A08L-01 MYC 1 0
TCGA-C8-A138-01 MYC 1 0
TCGA-AQ-A54N-01 NF1 1 0
TCGA-A2-A0CW-01 PRKCA 1 0
TCGA-A2-A0CW-01 RAF1 1 0
TCGA-E9-A1RH-01 RAG2 0 1
TCGA-AN-A0AL-01 ERBB4 1 0
TCGA-JL-A3YW-01 ERBB4 1 0
TCGA-BH-A0B9-01 FLT1 1 0
TCGA-EW-A1PB-01 FLT4 1 0
TCGA-A8-A08B-01 KDR 1 0
TCGA-A8-A06N-01 SNAI2 1 0
TCGA-A8-A09K-01 SNAI2 1 0
TCGA-AN-A0XU-01 RB1 1 0
TCGA-BH-A0B3-01 RB1 1 0

In case of ambiguity, pritority is given to RNA over CNA

RNA and Prot

When available, prot is used. Otherwise, we use RNA

Patient Gene RNA Prot
TCGA-A2-A0CQ-01 PRKAA1 0 1
TCGA-E2-A1LG-01 PRKAA1 1 0
TCGA-EW-A1PA-01 BAX 0 1
TCGA-D8-A1XK-01 CTNNB1 1 0
TCGA-E2-A1LG-01 CTNNB1 1 0
TCGA-BH-A0HY-01 DVL3 1 0
TCGA-A2-A0EX-01 EEF2 1 0
TCGA-AR-A2LE-01 EEF2 1 0
TCGA-C8-A1HN-01 EEF2 0 1
TCGA-BH-A1ES-06 GSK3A 1 0
TCGA-E9-A3X8-01 MAPK8 0 1
TCGA-A2-A3XU-01 CDKN1A 1 0
TCGA-A2-A259-01 TP53 1 0
TCGA-BH-A1ES-06 TP53 0 1
TCGA-B6-A0I2-01 RPS6KB1 0 1
TCGA-D8-A1X5-01 RPS6KA1 0 1
TCGA-A1-A0SK-01 PDK1 1 0
TCGA-A7-A13D-01 PDK1 1 0
TCGA-AN-A0FZ-01 PDK1 0 1
TCGA-JL-A3YX-01 PIK3CA 0 1
TCGA-LL-A5YM-01 PIK3CA 0 1
TCGA-EW-A1PB-01 PTEN 0 1
TCGA-LL-A5YL-01 RAF1 0 1
TCGA-D8-A1XK-01 BRAF 1 0
TCGA-A2-A1FZ-01 NRAS 0 1
TCGA-AC-A3TN-01 NRAS 0 1
TCGA-E9-A1RH-01 NRAS 0 1
TCGA-E2-A14O-01 KDR 0 1
TCGA-E9-A5UP-01 KDR 0 1
TCGA-E2-A10C-01 SMAD4 0 1
TCGA-E9-A1R6-01 SMAD4 0 1
TCGA-E2-A1LG-01 SNAI1 0 1
TCGA-OL-A66I-01 SNAI1 1 0
TCGA-D8-A143-01 RB1 0 1

Mut and Exp

Patient Gene Mut Exp
TCGA-EW-A1P3-01 CASP8 0 1
TCGA-EW-A1OX-01 CHEK1 0 1
TCGA-AR-A1AI-01 CCNB3 0 1
TCGA-D8-A1Y0-01 MTOR 1 0
TCGA-AR-A2LE-01 CDKN1A 0 1
TCGA-A1-A0SK-01 TP53 0 1
TCGA-A2-A04T-01 TP53 0 1
TCGA-A2-A3XT-01 TP53 0 1
TCGA-A7-A4SE-01 TP53 0 1
TCGA-A8-A07R-01 TP53 0 1
TCGA-A8-A08X-01 TP53 0 1
TCGA-AN-A0FY-01 TP53 0 1
TCGA-AQ-A54N-01 TP53 0 1
TCGA-AR-A1AJ-01 TP53 0 1
TCGA-AR-A1AW-01 TP53 0 1
TCGA-B6-A0WX-01 TP53 0 1
TCGA-BH-A0AV-01 TP53 0 1
TCGA-BH-A0B9-01 TP53 0 1
TCGA-BH-A0E0-01 TP53 0 1
TCGA-BH-A203-01 TP53 0 1
TCGA-C8-A12L-01 TP53 0 1
TCGA-C8-A12Z-01 TP53 0 1
TCGA-C8-A1HM-01 TP53 0 1
TCGA-C8-A26W-01 TP53 0 1
TCGA-C8-A26Y-01 TP53 0 1
TCGA-C8-A27B-01 TP53 0 1
TCGA-D8-A1JJ-01 TP53 0 1
TCGA-D8-A1XT-01 TP53 0 1
TCGA-E2-A1AZ-01 TP53 0 1
TCGA-E2-A1L7-01 TP53 0 1
TCGA-E2-A2P5-01 TP53 0 1
TCGA-E2-A574-01 TP53 0 1
TCGA-E9-A244-01 TP53 0 1
TCGA-EW-A1PC-01 TP53 0 1
TCGA-GI-A2C9-01 TP53 0 1
TCGA-LL-A441-01 TP53 0 1
TCGA-LL-A5YP-01 TP53 0 1
TCGA-A2-A0T4-01 PIK3CA 1 0
TCGA-A2-A0YC-01 PIK3CA 1 0
TCGA-A8-A095-01 PIK3CA 1 0
TCGA-BH-A0EE-01 PIK3CA 1 0
TCGA-C8-A12N-01 PIK3CA 1 0
TCGA-C8-A131-01 PIK3CA 1 0
TCGA-C8-A1HF-01 PIK3CA 1 0
TCGA-C8-A274-01 PIK3CA 1 0
TCGA-D8-A1JJ-01 PIK3CA 1 0
TCGA-D8-A1JK-01 PIK3CA 1 0
TCGA-E2-A10C-01 PIK3CA 1 0
TCGA-E2-A15K-01 PIK3CA 1 0
TCGA-E9-A1R4-01 PIK3CA 1 0
TCGA-E9-A1RC-01 PIK3CA 1 0
TCGA-EW-A1PE-01 PIK3CA 1 0
TCGA-BH-A18G-01 PRKCA 0 1
TCGA-A7-A26G-01 PTEN 0 1
TCGA-A8-A06U-01 RHEB 1 0
TCGA-D8-A143-01 RB1 0 1

Write profiles