Contents

1 Introduction

This vignette show how to use MOFA+ on the bulk multi-omics data set that was used in thr original publication of MOFA and the vignette of the MOFA package. The data consisted of four omics including methylation, RNAseq, genomic and drug response data. In the first application of MOFA we considered all 200 patients as a single group. Here, we will separately consider samples from male and female patients and apply multi-group inference using MOFA+.

2 Load libraries

library(MOFA2)
## Warning: replacing previous import 'MultiAssayExperiment::complete.cases' by
## 'stats::complete.cases' when loading 'MOFA2'
library(MOFAdata)

3 Prepare a MOFA+ object

utils::data("CLL_data")
utils::data("CLL_covariates")
MOFAobject <- create_mofa(CLL_data, groups = CLL_covariates$Gender)
## Creating MOFA object from a list of matrices (features as rows, sample as columns)...
## You have requested the multi-group inference framework.
## It is an advanced option, if this is the first time that you are running MOFA, we suggest that you try first without specifying groups
MOFAobject
## Untrained MOFA model with the following characteristics: 
##  Number of views: 4 
##  Views names: Drugs Methylation mRNA Mutations 
##  Number of features (per view): 310 4248 5000 69 
##  Number of groups: 2 
##  Groups names: m f 
##  Number of samples (per group): 121 79

3.1 Alternative option: create MOFA+ object using MultiAssayExperiment

Instead of a list of matrices as above, you can also provide a MultiAssayExperiment object and build the MOFA+ object based on this. Covariates stored in the colData slot of the MultiAssayExperiment can be used to specify groups.

library(MultiAssayExperiment)

# Create MultiAssayExperiment object 
mae_CLL <- MultiAssayExperiment(
  experiments = CLL_data, 
  colData = CLL_covariates
)

# Build the MOFA object
MOFAobject <- create_mofa(mae_CLL, groups = "Gender")
## You have requested the multi-group inference framework.
## It is an advanced option, if this is the first time that you are running MOFA, we suggest that you try first without specifying groups
MOFAobject
## Untrained MOFA model with the following characteristics: 
##  Number of views: 4 
##  Views names: Drugs Methylation mRNA Mutations 
##  Number of features (per view): 310 4248 5000 69 
##  Number of groups: 2 
##  Groups names: m f 
##  Number of samples (per group): 121 79

3.2 Define MOFA options

Data options: let’s use default

data_opts <- get_default_data_options(MOFAobject)

Model options: let’s use 5 factors

model_opts <- get_default_model_options(MOFAobject)
model_opts$num_factors <- 5

Training options

train_opts <- get_default_training_options(MOFAobject)
train_opts$convergence_mode <- "fast"
train_opts$seed <- 123

3.3 Prepare the MOFA object

MOFAobject <- prepare_mofa(
  object = MOFAobject,
  data_options = data_opts,
  model_options = model_opts,
  training_options = train_opts
)
## Checking data options...
## Checking training options...
## Checking model options...

4 Train the MOFA model

This step takes around 1 min.

outfile = paste0(getwd(),"/CLLmodel.hdf5")
MOFAmodel <- run_mofa(MOFAobject, outfile = outfile)
## Warning: Output file already exists, it will be replaced
## Warning in quality_control(object, verbose = verbose): Factor(s) 3 are strongly correlated with the total number of expressed features for at least one of your omics. Such factors appear when there are global differences between your samples, sometimes because of poor normalisation in the preprocessing steps. We recommend that you either try a better normalisation method or you remove the factors using `subset_factors`.

4.1 Load pre-computed model

MOFA models are saved in hdf5 format and can be loaded into R with the function load_model.

MOFAmodel <- load_model(outfile)
## Warning in quality_control(object, verbose = verbose): Factor(s) 3 are strongly correlated with the total number of expressed features for at least one of your omics. Such factors appear when there are global differences between your samples, sometimes because of poor normalisation in the preprocessing steps. We recommend that you either try a better normalisation method or you remove the factors using `subset_factors`.

5 Inspect MOFA object

The MOFAobject consists of multiple slots where relevant data and information is stored. For descriptions, you can read the documentation by ?MOFA

slotNames(MOFAmodel)
##  [1] "data"               "intercepts"         "imputed_data"      
##  [4] "samples_metadata"   "features_metadata"  "expectations"      
##  [7] "training_stats"     "training_options"   "stochastic_options"
## [10] "data_options"       "model_options"      "dimensions"        
## [13] "on_disk"            "dim_red"            "cache"             
## [16] "status"

5.1 Sample metadata:

Here additional meta-data can be stored for all samples. In this object we have sample names and group-label (sex).

head(MOFAmodel@samples_metadata)
##   sample group
## 1   H045     m
## 2   H109     m
## 3   H024     m
## 4   H056     m
## 5   H079     m
## 6   H059     m

5.2 Overview of training data

The function plot_data_overview can be used to obtain an overview of the input data. It shows how many views (rows) and how many groups (columns) exist, what are their corresponding dimensionalities and how many missing information they have (grey bars).
In this case we have four views and two groups that correspond to M-CLL and U-CLL samples.

plot_data_overview(MOFAmodel)

6 Plot variance explained

Quantifying the variance explained per factor across groups and views is probably the most important plot that MOFA+ generates. It summarises the (latent) signal from a complex heterogeneous data set in a single figure.

6.1 Plot variance explained per factor across groups

Here, we do not see large difference in the variance explained by the different groups. Like in the first MOFA application the first factor is shared across all omics both for male and female sampels

plot_variance_explained(MOFAmodel, x="group", y="factor")

6.2 Plot total variance explained per group

We can also plot the total variance explained per group (with all factors) by adding the argument plot_total = TRUE. Here, the 5 factors capture between 20% and 30% of variance in each group.

p <- plot_variance_explained(MOFAmodel, x="group", y="factor", plot_total = T)

p[[2]]

6.3 Plot variance explained for individual features

We can also inspect the variance explained by the MOFA factors for individual features.

features2plot <- MOFA2::features(MOFAmodel)$Drugs[grep("D_002|D_001", features(MOFAmodel)$Drugs)]

Variance explained by all factors across all groups

plot_variance_explained_per_feature(
  MOFAmodel, 
  factors = 1,
  groups = "all",
  view = "Drugs",
  features = features2plot
)

7 Characterise Factor 1

7.1 Visualisation of factor values

Each factor ordinates samples along a one-dimensional axis that is centered at zero. Samples with different signs indicate opposite phenotypes, with higher absolute value indicating a stronger phenotype. Here we plot all Factor values and we color samples by IGHV status, which like in the original MOFA publicaiton is captured by Factor 1.

plot_factor(MOFAmodel, 
  factor = 1:2,
  color_by = "IGHV"
) 

Here are other ways of representing the same plot (here only factor 1):

p <- plot_factor(MOFAmodel, 
  factor = 1,
  color_by = "group",
  dot_size = 0.2,      # change dot size
  dodge = T,           # dodge points with different colors
  legend = F,          # remove legend
  add_violin = T,      # add violin plots,
  violin_alpha = 0.25  # transparency of violin plots
)

p

Combinations of factors can be plotted with plot_factors:

plot_factors(MOFAmodel, 
  factors = c(1,2),
  color_by = "group"
)

plot_factors(MOFAmodel, 
  factors = c(1,2),
  color_by = "IGHV"
)

7.2 Visualisation of weights

The weights provide a score for each feature on each factor. Features with no association with the factor are expected to have values close to zero, whereas features with strong association with the factor are expected to have large absolute values. The sign of the loading indicates the direction of the effect: a positive loading indicates that the feature is more active in the cells with positive factor values, and viceversa. \ Let’s plot the distribution of weights for Factor 1. IGHV in the genomics views has highest weight on this factor.

plot_weights(MOFAmodel,
  view = "Drugs",
  factor = 1,
  nfeatures = 10,     # Top number of features to highlight
  scale = T           # Scale weights from -1 to 1
)

plot_weights(MOFAmodel,
  view = "Mutations",
  factor = 1,
  nfeatures = 10,     # Top number of features to highlight
  scale = T           # Scale weights from -1 to 1
)

8 Enrichment analysis

In addition to exploring the weights for each factor, we can use enrichment analysis to look for siginificant associaitons of factors to genesets. Here, we use the Reactome genesets for illustrations, which is contained in the MOFAdata package.

# load geneset annotations 
utils::data(reactomeGS)

# run enrichment analysis
res <- run_enrichment(MOFAmodel, feature.sets = reactomeGS, "mRNA")
## Running feature set Enrichment Analysis with the following options...
## View: mRNA
## Number of feature sets: 501
## Local statistic: loading
## Transformation: abs.value
## Global statistic: mean.diff
## Statistical test: parametric
# show overivew of significant pathways per factor
plot_enrichment_heatmap(res, alpha = 0.05)

# show enriched gene sets on a given factor
plot_enrichment(res, factor = 3, alpha = 0.05)

# show top genes within the pathways
plot_enrichment_detailed(MOFAmodel, factor = 3, enrichment.results = res,
                         feature.sets = reactomeGS, max.pathways = 3, alpha = 0.05)

9 SessionInfo

sessionInfo()
## R version 3.6.1 (2019-07-05)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Mojave 10.14.6
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] parallel  stats4    stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] MultiAssayExperiment_1.10.4 SummarizedExperiment_1.14.1
##  [3] DelayedArray_0.10.0         BiocParallel_1.18.1        
##  [5] matrixStats_0.55.0          Biobase_2.44.0             
##  [7] GenomicRanges_1.36.1        GenomeInfoDb_1.20.0        
##  [9] IRanges_2.18.3              S4Vectors_0.22.1           
## [11] BiocGenerics_0.30.0         MOFAdata_1.0.0             
## [13] MOFA2_0.99.3                BiocStyle_2.12.0           
## 
## loaded via a namespace (and not attached):
##  [1] ggrepel_0.8.1          Rcpp_1.0.2             lattice_0.20-38       
##  [4] tidyr_1.0.0            assertthat_0.2.1       zeallot_0.1.0         
##  [7] digest_0.6.22          plyr_1.8.4             R6_2.4.0              
## [10] backports_1.1.5        evaluate_0.14          ggplot2_3.2.1         
## [13] pillar_1.4.2           zlibbioc_1.30.0        rlang_0.4.1           
## [16] lazyeval_0.2.2         Matrix_1.2-17          reticulate_1.13       
## [19] rmarkdown_1.16         labeling_0.3           stringr_1.4.0         
## [22] pheatmap_1.0.12        RCurl_1.95-4.12        munsell_0.5.0         
## [25] HDF5Array_1.12.3       compiler_3.6.1         xfun_0.10             
## [28] pkgconfig_2.0.3        htmltools_0.4.0        tidyselect_0.2.5      
## [31] tibble_2.1.3           GenomeInfoDbData_1.2.1 bookdown_0.14         
## [34] withr_2.1.2            crayon_1.3.4           dplyr_0.8.3           
## [37] bitops_1.0-6           grid_3.6.1             jsonlite_1.6          
## [40] gtable_0.3.0           lifecycle_0.1.0        magrittr_1.5          
## [43] scales_1.0.0           stringi_1.4.3          reshape2_1.4.3        
## [46] XVector_0.24.0         vctrs_0.2.0            cowplot_1.0.0         
## [49] Rhdf5lib_1.6.3         RColorBrewer_1.1-2     forcats_0.4.0         
## [52] tools_3.6.1            glue_1.3.1             purrr_0.3.3           
## [55] yaml_2.2.0             colorspace_1.4-1       rhdf5_2.28.1          
## [58] BiocManager_1.30.9     corrplot_0.84          knitr_1.25