Objectives

Generate patients profiles from BRCA METABRIC data for further logical modelling.

Independant omics profiles

META dataset recap

More than 1800 patients with several kinds of omics data: exome-sequencing, Copy Number Alterations (CNA), RNA and clinical annotations

Most patients have all omics data:

## NULL

We investigate relations betwenn RNAseq data and BC subtypes. Subtypes have been defined based on PAM50 method. First, here is the distribution of BC subtypes across the cohort:

RNAseq is projected on PC1/PC2 space (from Principal Component Analysis), from PAM50 genes only (using only 47 out of 50 genes in PAM50 list present in RNAseq)

Processing pipeline

Mutations profiles

We need to assign Boolean effects to mutations: either 0 (inactivating) or 1 (activating). A mutation can stay unassigned in absence of any evidence.

Assignement methods and their respective influence:

## [1] "Sankey plots of mutation assignments depending on methods used"
## [1] "Sankey plots of mutation assignments depending on methods used (restricted to model-related nodes)"

Now we can summarize patient mutation profiles after processing. In following plots we focus on model-related genes only.

CNA profiles

For CNA, we have decided to focus on stringent amplifications/deletions corresponding to +2/-2 GISTIC results. We produce the same kind of plots.

RNA profiles

RNA data is intrinsically continuous and therefore require preliminary data processing. It is important to notice that METABRIC data comes from microarray.

Binarization with classification tree

## ===========================================================================

Now, what about the distribution of gene categories (Bimodal, Unimodal…) across the cohort?

## [1] "META assignments:"
Bimodal Unimodal
29 24339
## [1] "META assignments for model-related nodes:"
Bimodal Unimodal
1 55

Here are some distributions plots randomly picked in each category in META cohort

And depending on distribution category, we can perform binarization

## [1] "Bimodal example:"

## [1] "Unimodal example:"

Normalization

Merged profiles

Data types relations

Before merging independant profiles into multi-omics profiles, let’s have a look at relations between data types

Mutations and CNA

In particular, is there any mutation/CNA binary inconsistency?

Patient Gene Mut CNA
MB-0607 ERBB2 0 1
MB-4079 KMT2D 0 1
MB-5196 KMT2D 0 1
MB-4332 CDKN1B 0 1
MB-3614 RB1 0 1

In case of ambiguity, pritority is given to mutations over CNA

CNA and RNA

Patient Gene CNA RNA
MB-0569 BAD 1 0
MB-0149 CCND2 1 0
MB-4660 CCND2 1 0
MB-5135 CCND2 1 0
MB-4426 E2F5 1 0
MB-5174 E2F5 1 0
MB-5190 E2F5 1 0
MB-3165 E2F6 1 0
MB-2796 EIF4G2 1 0
MB-0396 ESR1 1 0
MB-0453 ESR1 1 0
MB-0516 ESR1 1 0
MB-3386 ESR1 1 0
MB-4640 ESR1 1 0
MB-4871 ESR1 1 0
MB-5135 ESR1 1 0
MB-5634 ESR1 1 0
MB-7038 ESR1 1 0
MB-7082 ESR1 1 0
MB-7269 ESR1 1 0
MB-0191 FOXA1 1 0
MB-3706 FOXA1 1 0
MB-7154 IGF1R 1 0
MB-4289 MCL1 1 0
MB-4350 MCL1 1 0
MB-2923 MYC 1 0
MB-5174 MYC 1 0
MB-5358 CDKN1A 1 0
MB-0439 PBX1 1 0
MB-5039 AKT1S1 1 0
MB-0053 TSC2 1 0

In case of ambiguity, pritority is given to RNA over CNA

Mut and RNA

Patient Gene Mut RNA
MB-0470 ERBB2 1 0
MB-0528 KMT2D 0 1
MB-3500 KMT2D 0 1
MB-4630 KMT2D 0 1
MB-0133 PIK3CA 1 0
MB-0308 PIK3CA 1 0
MB-0317 PIK3CA 1 0
MB-0422 PIK3CA 1 0
MB-0451 PIK3CA 1 0
MB-0486 PIK3CA 1 0
MB-4012 PIK3CA 1 0
MB-4426 PIK3CA 1 0
MB-5163 PIK3CA 1 0
MB-5204 PIK3CA 1 0
MB-0365 RB1 0 1

Write profiles