MOFA2 0.99.3
This vignette show how to use MOFA+ on the bulk multi-omics data set that was used in thr original publication of MOFA and the vignette of the MOFA package. The data consisted of four omics including methylation, RNAseq, genomic and drug response data. In the first application of MOFA we considered all 200 patients as a single group. Here, we will separately consider samples from male and female patients and apply multi-group inference using MOFA+.
library(MOFA2)
## Warning: replacing previous import 'MultiAssayExperiment::complete.cases' by
## 'stats::complete.cases' when loading 'MOFA2'
library(MOFAdata)
utils::data("CLL_data")
utils::data("CLL_covariates")
MOFAobject <- create_mofa(CLL_data, groups = CLL_covariates$Gender)
## Creating MOFA object from a list of matrices (features as rows, sample as columns)...
## You have requested the multi-group inference framework.
## It is an advanced option, if this is the first time that you are running MOFA, we suggest that you try first without specifying groups
MOFAobject
## Untrained MOFA model with the following characteristics:
## Number of views: 4
## Views names: Drugs Methylation mRNA Mutations
## Number of features (per view): 310 4248 5000 69
## Number of groups: 2
## Groups names: m f
## Number of samples (per group): 121 79
Instead of a list of matrices as above, you can also provide a MultiAssayExperiment object and build the MOFA+ object based on this. Covariates stored in the colData
slot of the MultiAssayExperiment can be used to specify groups.
library(MultiAssayExperiment)
# Create MultiAssayExperiment object
mae_CLL <- MultiAssayExperiment(
experiments = CLL_data,
colData = CLL_covariates
)
# Build the MOFA object
MOFAobject <- create_mofa(mae_CLL, groups = "Gender")
## You have requested the multi-group inference framework.
## It is an advanced option, if this is the first time that you are running MOFA, we suggest that you try first without specifying groups
MOFAobject
## Untrained MOFA model with the following characteristics:
## Number of views: 4
## Views names: Drugs Methylation mRNA Mutations
## Number of features (per view): 310 4248 5000 69
## Number of groups: 2
## Groups names: m f
## Number of samples (per group): 121 79
Data options: let’s use default
data_opts <- get_default_data_options(MOFAobject)
Model options: let’s use 5 factors
model_opts <- get_default_model_options(MOFAobject)
model_opts$num_factors <- 5
Training options
train_opts <- get_default_training_options(MOFAobject)
train_opts$convergence_mode <- "fast"
train_opts$seed <- 123
MOFAobject <- prepare_mofa(
object = MOFAobject,
data_options = data_opts,
model_options = model_opts,
training_options = train_opts
)
## Checking data options...
## Checking training options...
## Checking model options...
This step takes around 1 min.
outfile = paste0(getwd(),"/CLLmodel.hdf5")
MOFAmodel <- run_mofa(MOFAobject, outfile = outfile)
## Warning: Output file already exists, it will be replaced
## Warning in quality_control(object, verbose = verbose): Factor(s) 3 are strongly correlated with the total number of expressed features for at least one of your omics. Such factors appear when there are global differences between your samples, sometimes because of poor normalisation in the preprocessing steps. We recommend that you either try a better normalisation method or you remove the factors using `subset_factors`.
MOFA models are saved in hdf5 format and can be loaded into R with the function load_model
.
MOFAmodel <- load_model(outfile)
## Warning in quality_control(object, verbose = verbose): Factor(s) 3 are strongly correlated with the total number of expressed features for at least one of your omics. Such factors appear when there are global differences between your samples, sometimes because of poor normalisation in the preprocessing steps. We recommend that you either try a better normalisation method or you remove the factors using `subset_factors`.
The MOFAobject consists of multiple slots where relevant data and information is stored. For descriptions, you can read the documentation by ?MOFA
slotNames(MOFAmodel)
## [1] "data" "intercepts" "imputed_data"
## [4] "samples_metadata" "features_metadata" "expectations"
## [7] "training_stats" "training_options" "stochastic_options"
## [10] "data_options" "model_options" "dimensions"
## [13] "on_disk" "dim_red" "cache"
## [16] "status"
Here additional meta-data can be stored for all samples. In this object we have sample names and group-label (sex).
head(MOFAmodel@samples_metadata)
## sample group
## 1 H045 m
## 2 H109 m
## 3 H024 m
## 4 H056 m
## 5 H079 m
## 6 H059 m
The function plot_data_overview
can be used to obtain an overview of the input data.
It shows how many views (rows) and how many groups (columns) exist, what are their corresponding dimensionalities and how many missing information they have (grey bars).
In this case we have four views and two groups that correspond to M-CLL and U-CLL samples.
plot_data_overview(MOFAmodel)
Quantifying the variance explained per factor across groups and views is probably the most important plot that MOFA+ generates. It summarises the (latent) signal from a complex heterogeneous data set in a single figure.
Here, we do not see large difference in the variance explained by the different groups. Like in the first MOFA application the first factor is shared across all omics both for male and female sampels
plot_variance_explained(MOFAmodel, x="group", y="factor")
We can also plot the total variance explained per group (with all factors) by adding the argument plot_total = TRUE
. Here, the 5 factors capture between 20% and 30% of variance in each group.
p <- plot_variance_explained(MOFAmodel, x="group", y="factor", plot_total = T)
p[[2]]
We can also inspect the variance explained by the MOFA factors for individual features.
features2plot <- MOFA2::features(MOFAmodel)$Drugs[grep("D_002|D_001", features(MOFAmodel)$Drugs)]
Variance explained by all factors across all groups
plot_variance_explained_per_feature(
MOFAmodel,
factors = 1,
groups = "all",
view = "Drugs",
features = features2plot
)
Each factor ordinates samples along a one-dimensional axis that is centered at zero. Samples with different signs indicate opposite phenotypes, with higher absolute value indicating a stronger phenotype. Here we plot all Factor values and we color samples by IGHV status, which like in the original MOFA publicaiton is captured by Factor 1.
plot_factor(MOFAmodel,
factor = 1:2,
color_by = "IGHV"
)
Here are other ways of representing the same plot (here only factor 1):
p <- plot_factor(MOFAmodel,
factor = 1,
color_by = "group",
dot_size = 0.2, # change dot size
dodge = T, # dodge points with different colors
legend = F, # remove legend
add_violin = T, # add violin plots,
violin_alpha = 0.25 # transparency of violin plots
)
p
Combinations of factors can be plotted with plot_factors
:
plot_factors(MOFAmodel,
factors = c(1,2),
color_by = "group"
)
plot_factors(MOFAmodel,
factors = c(1,2),
color_by = "IGHV"
)
The weights provide a score for each feature on each factor. Features with no association with the factor are expected to have values close to zero, whereas features with strong association with the factor are expected to have large absolute values. The sign of the loading indicates the direction of the effect: a positive loading indicates that the feature is more active in the cells with positive factor values, and viceversa. \ Let’s plot the distribution of weights for Factor 1. IGHV in the genomics views has highest weight on this factor.
plot_weights(MOFAmodel,
view = "Drugs",
factor = 1,
nfeatures = 10, # Top number of features to highlight
scale = T # Scale weights from -1 to 1
)
plot_weights(MOFAmodel,
view = "Mutations",
factor = 1,
nfeatures = 10, # Top number of features to highlight
scale = T # Scale weights from -1 to 1
)
In addition to exploring the weights for each factor, we can use enrichment analysis to look for siginificant associaitons of factors to genesets. Here, we use the Reactome genesets for illustrations, which is contained in the MOFAdata
package.
# load geneset annotations
utils::data(reactomeGS)
# run enrichment analysis
res <- run_enrichment(MOFAmodel, feature.sets = reactomeGS, "mRNA")
## Running feature set Enrichment Analysis with the following options...
## View: mRNA
## Number of feature sets: 501
## Local statistic: loading
## Transformation: abs.value
## Global statistic: mean.diff
## Statistical test: parametric
# show overivew of significant pathways per factor
plot_enrichment_heatmap(res, alpha = 0.05)
# show enriched gene sets on a given factor
plot_enrichment(res, factor = 3, alpha = 0.05)
# show top genes within the pathways
plot_enrichment_detailed(MOFAmodel, factor = 3, enrichment.results = res,
feature.sets = reactomeGS, max.pathways = 3, alpha = 0.05)
sessionInfo()
## R version 3.6.1 (2019-07-05)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Mojave 10.14.6
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] parallel stats4 stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] MultiAssayExperiment_1.10.4 SummarizedExperiment_1.14.1
## [3] DelayedArray_0.10.0 BiocParallel_1.18.1
## [5] matrixStats_0.55.0 Biobase_2.44.0
## [7] GenomicRanges_1.36.1 GenomeInfoDb_1.20.0
## [9] IRanges_2.18.3 S4Vectors_0.22.1
## [11] BiocGenerics_0.30.0 MOFAdata_1.0.0
## [13] MOFA2_0.99.3 BiocStyle_2.12.0
##
## loaded via a namespace (and not attached):
## [1] ggrepel_0.8.1 Rcpp_1.0.2 lattice_0.20-38
## [4] tidyr_1.0.0 assertthat_0.2.1 zeallot_0.1.0
## [7] digest_0.6.22 plyr_1.8.4 R6_2.4.0
## [10] backports_1.1.5 evaluate_0.14 ggplot2_3.2.1
## [13] pillar_1.4.2 zlibbioc_1.30.0 rlang_0.4.1
## [16] lazyeval_0.2.2 Matrix_1.2-17 reticulate_1.13
## [19] rmarkdown_1.16 labeling_0.3 stringr_1.4.0
## [22] pheatmap_1.0.12 RCurl_1.95-4.12 munsell_0.5.0
## [25] HDF5Array_1.12.3 compiler_3.6.1 xfun_0.10
## [28] pkgconfig_2.0.3 htmltools_0.4.0 tidyselect_0.2.5
## [31] tibble_2.1.3 GenomeInfoDbData_1.2.1 bookdown_0.14
## [34] withr_2.1.2 crayon_1.3.4 dplyr_0.8.3
## [37] bitops_1.0-6 grid_3.6.1 jsonlite_1.6
## [40] gtable_0.3.0 lifecycle_0.1.0 magrittr_1.5
## [43] scales_1.0.0 stringi_1.4.3 reshape2_1.4.3
## [46] XVector_0.24.0 vctrs_0.2.0 cowplot_1.0.0
## [49] Rhdf5lib_1.6.3 RColorBrewer_1.1-2 forcats_0.4.0
## [52] tools_3.6.1 glue_1.3.1 purrr_0.3.3
## [55] yaml_2.2.0 colorspace_1.4-1 rhdf5_2.28.1
## [58] BiocManager_1.30.9 corrplot_0.84 knitr_1.25