Use mean squared error of predicting the expression of held-out genes to evaluate the quality of a clonealign fit.

evaluate_clonealign(gene_expression_data, copy_number_data, clonealign_fit,
  prop_holdout = 0.2, n_samples = 100, s = NULL, ...)

Arguments

gene_expression_data

Either a SingleCellExperiment or matrix of counts, same as input to clonealign

copy_number_data

A gene-by-clone matrix of copy numbers, the same as the input to clonealign

clonealign_fit

The fit returned by a call to clonealign() on the full geneset

prop_holdout

The proportion of genes to hold out as a test set for predicting gene expression

n_samples

The number of random permutations to establish a null distribution of predictive performance

s

Vector of cell size factors - defaults to the total counts per cell

...

Additional arguments to pass to the clonealign call

Value

A list object describing the evaluation results. See details

Details

This evaluation function is built around the idea of how good predicted expression is under the model given the inferred (assigned) clones compared to how well you could predict expression given a random clonal assignment.

Evaluations performed

  1. On the full dataset, the mean square error is compared to the randomized error, and a message printed about the ratio of the two errors and the proportion of time the observed error was less than the error under a null distribution. If either the error under the null is smaller than the observed, or is smaller some percentage of time, then the fit has real problems and should be abandoned as it suggests the model is stuck in a local maxima (which could happen if the proposed clones aren't actually present in the RNA-seq).

  2. A certain proportion of genes (as decided by the prop_holdout parameter) are held out as a test set, and the clonealign fit is re-performed on the 1 - prop_holdout proportion of genes. The function will then print an agreement table of clone assignments between the full table and the reduced table. If these vastly disagree for only a small change in gene set (ie prop_holdout is around 0.1 or 0.2), then the fit may be unreliable as it is sensitive to the genes inputted.

  3. The same metrics from (1) are then printed where the evaluations are performed on the heldout set. Again, if the observed mean squared error given the clonealign fit isn't less than the null mean squared error a large proportion of the time, the fit may be unreliable.

Object returned

Everything computed above is returned in a list object with the following entries:

  • full_clonealign_fit - the original clonealign fit on the full gene set that was passed in as the clonealign_fit argument

  • full_observed_mse - the observed mean square error using the full gene set

  • full_null_mse - A vector of mean square error under the randomized (permuted) distribution

  • reduced_clonealign_fit - a clonealign fit on only the reduced (train) set of genes

  • heldout_inds - the indices of the genes held out (test set) for evaluating predictive performance out-of-sample

  • kept_inds - the indices of retained genes as part of the reduced (train) set

  • heldout_observed_mse - the observed mean square error on the heldout (test) set of genes

  • heldout_null_mse - a vector of mean square errors under a null distribution of randomly permuting the clones

Examples

library(SingleCellExperiment) data(example_clonealign_fit) data(example_sce) copy_number_data <- rowData(example_sce)[,c("A", "B", "C")] evaluate_clonealign(example_sce, copy_number_data, example_clonealign_fit)
#> On the full dataset, the observed MSE was on average 1.00068199101321 times smaller than under a null model and smaller 100% of the time
#> Fitting reduced clonealign model...
#>
#> running VB [>--------------------------------] 2% | change in elbo 0.0096%
#>
#> running VB [>--------------------------------] 3% | change in elbo 0.0070%
#>
#> running VB [>--------------------------------] 4% | change in elbo 0.0053%
#>
#> running VB [=>-------------------------------] 5% | change in elbo 0.0015%
#>
#> running VB [=>-------------------------------] 6% | change in elbo 0.0039%
#>
#> running VB [=>-------------------------------] 7% | change in elbo 0.0033%
#>
#> running VB [==>------------------------------] 8% | change in elbo 0.0032%
#>
#> running VB [==>------------------------------] 9% | change in elbo 0.0022%
#>
#> running VB [==>------------------------------] 10% | change in elbo 0.0015%
#>
#> running VB [===>-----------------------------] 11% | change in elbo 0.0023%
#>
#> running VB [===>-----------------------------] 12% | change in elbo 0.0015%
#>
#> running VB [===>-----------------------------] 13% | change in elbo 0.0017%
#>
#> running VB [====>----------------------------] 14% | change in elbo 0.0014%
#>
#> running VB [====>---------------------------] 15% | change in elbo 0.00092%
#>
#> running VB [====>----------------------------] 16% | change in elbo 0.0019%
#>
#> running VB [=====>---------------------------] 17% | change in elbo 0.0014%
#>
#> running VB [=====>---------------------------] 18% | change in elbo 0.0015%
#>
#> running VB [=====>------------------------] 19% | change in elbo -0.000064%
#>
#> running VB [======>--------------------------] 20% | change in elbo 0.0019%
#>
#> running VB [======>--------------------------] 21% | change in elbo 0.0016%
#>
#> running VB [======>-------------------------] 22% | change in elbo 0.00080%
#>
#> running VB [=======>-------------------------] 23% | change in elbo 0.0015%
#>
#> running VB [=======>------------------------] 24% | change in elbo 0.00050%
#>
#> running VB [=======>------------------------] 25% | change in elbo 0.00018%
#>
#> running VB [=======>-----------------------] 26% | change in elbo -0.00022%
#>
#> running VB [========>------------------------] 27% | change in elbo 0.0023%
#>
#> running VB [========>-----------------------] 28% | change in elbo 0.00071%
#>
#> running VB [========>------------------------] 29% | change in elbo 0.0011%
#>
#> running VB [========>----------------------] 30% | change in elbo -0.00064%
#>
#> running VB [=========>-----------------------] 31% | change in elbo 0.0011%
#>
#> running VB [=========>----------------------] 32% | change in elbo 0.00023%
#>
#> running VB [=========>----------------------] 33% | change in elbo 0.00093%
#>
#> running VB [=========>---------------------] 34% | change in elbo -0.00051%
#>
#> running VB [==========>----------------------] 35% | change in elbo 0.0016%
#>
#> running VB [==========>--------------------] 36% | change in elbo -0.00023%
#>
#> running VB [===========>---------------------] 37% | change in elbo 0.0010%
#>
#> running VB [===========>-------------------] 38% | change in elbo -0.00081%
#>
#> running VB [============>--------------------] 39% | change in elbo 0.0023%
#>
#> running VB [===========>-------------------] 40% | change in elbo 0.000047%
#>
#> running VB [============>------------------] 41% | change in elbo -0.00077%
#>
#> running VB [============>-------------------] 42% | change in elbo 0.00096%
#>
#> running VB [============>------------------] 43% | change in elbo -0.00097%
#>
#> running VB [=============>------------------] 44% | change in elbo 0.00014%
#>
#> running VB [=============>------------------] 45% | change in elbo 0.00090%
#>
#> running VB [==============>-----------------] 46% | change in elbo 0.00092%
#>
#> running VB [==============>------------------] 47% | change in elbo 0.0011%
#>
#> running VB [==============>-----------------] 48% | change in elbo -0.0013%
#>
#> running VB [===============>----------------] 49% | change in elbo -0.0011%
#>
#> running VB [===============>-----------------] 50% | change in elbo 0.0029%
#>
#> running VB [===============>----------------] 50% | change in elbo 0.00034%
#>
#> running VB [===============>---------------] 51% | change in elbo -0.00052%
#>
#> running VB [===============>--------------] 52% | change in elbo -0.000024%
#>
#> running VB [================>---------------] 53% | change in elbo 0.00064%
#>
#> running VB [================>--------------] 54% | change in elbo -0.00053%
#>
#> running VB [================>--------------] 55% | change in elbo -0.00031%
#>
#> running VB [=================>--------------] 56% | change in elbo 0.00091%
#>
#> running VB [=================>--------------] 57% | change in elbo -0.0010%
#>
#> running VB [==================>--------------] 58% | change in elbo 0.0013%
#>
#> running VB [=================>-------------] 59% | change in elbo -0.00097%
#>
#> running VB [===================>-------------] 60% | change in elbo 0.0011%
#>
#> running VB [==================>------------] 61% | change in elbo -0.00020%
#>
#> running VB [===================>------------] 62% | change in elbo 0.00016%
#>
#> running VB [===================>------------] 63% | change in elbo -0.0021%
#>
#> running VB [====================>------------] 64% | change in elbo 0.0021%
#>
#> running VB [===================>-----------] 65% | change in elbo -0.00032%
#>
#> running VB [====================>----------] 66% | change in elbo 0.000086%
#>
#> running VB [=====================>-----------] 67% | change in elbo 0.0017%
#>
#> running VB [=====================>----------] 68% | change in elbo -0.0026%
#>
#> running VB [======================>----------] 69% | change in elbo 0.0016%
#>
#> running VB [=====================>----------] 70% | change in elbo 0.00090%
#>
#> running VB [=====================>---------] 71% | change in elbo -0.00016%
#>
#> running VB [=====================>---------] 72% | change in elbo -0.00028%
#>
#> running VB [======================>--------] 73% | change in elbo 0.000047%
#>
#> running VB [======================>--------] 74% | change in elbo -0.00094%
#>
#> running VB [=======================>--------] 75% | change in elbo 0.00023%
#>
#> running VB [=======================>--------] 76% | change in elbo 0.00030%
#>
#> running VB [=======================>-------] 77% | change in elbo 0.000049%
#>
#> running VB [========================>-------] 78% | change in elbo 0.00029%
#>
#> running VB [========================>------] 79% | change in elbo -0.00082%
#>
#> running VB [=========================>-------] 80% | change in elbo 0.0011%
#>
#> running VB [========================>------] 81% | change in elbo -0.00047%
#>
#> running VB [=========================>------] 82% | change in elbo 0.00067%
#>
#> running VB [=========================>-----] 83% | change in elbo -0.00077%
#>
#> running VB [==========================>-----] 84% | change in elbo 0.00048%
#>
#> running VB [==========================>-----] 85% | change in elbo 0.00032%
#>
#> running VB [=========================>----] 86% | change in elbo -0.000031%
#>
#> running VB [==========================>----] 87% | change in elbo -0.00091%
#>
#> running VB [===========================>----] 88% | change in elbo 0.00083%
#>
#> running VB [===========================>---] 89% | change in elbo -0.00097%
#>
#> running VB [============================>---] 90% | change in elbo 0.00092%
#>
#> running VB [===========================>---] 91% | change in elbo -0.00020%
#>
#> running VB [============================>---] 92% | change in elbo 0.00064%
#>
#> running VB [============================>--] 93% | change in elbo -0.00035%
#>
#> running VB [=============================>--] 94% | change in elbo 0.00016%
#>
#> running VB [============================>--] 95% | change in elbo -0.00029%
#>
#> running VB [============================>-] 96% | change in elbo -0.000041%
#>
#> running VB [==============================>-] 97% | change in elbo 0.00025%
#>
#> running VB [============================>-] 98% | change in elbo -0.000051%
#>
#> running VB [===============================>] 99% | change in elbo 0.00031%
#> Agreement between original (rows) and reduced (columns): #> A B C #> A 51 33 1 #> B 2 100 9 #> C 0 0 4 #> On the held-out dataset, the observed MSE was on average 1.00104532277777 times smaller than under a null model and smaller 100% of the time
#> $full_clonealign_fit #> A clonealign_fit for 200 cells, 100 genes, and 3 clones #> To access clone assignments, call x$clone #> To access ML parameter estimates, call x$ml_params #> $full_observed_mse #> [1] 8.165564 #> #> $full_null_mse #> [1] 8.154310 8.166977 8.188727 8.181782 8.165159 8.167555 8.179075 8.172130 #> [9] 8.184131 8.165554 8.187043 8.180571 8.176795 8.171300 8.157375 8.175110 #> [17] 8.185641 8.179719 8.171748 8.192153 8.161636 8.155985 8.154671 8.171442 #> [25] 8.187066 8.178323 8.170458 8.165517 8.172348 8.159575 8.174497 8.168530 #> [33] 8.179797 8.163616 8.189642 8.171136 8.169682 8.176496 8.160318 8.173017 #> [41] 8.173295 8.167151 8.179799 8.173885 8.180725 8.175595 8.177047 8.152684 #> [49] 8.173981 8.186816 8.157725 8.179524 8.165856 8.165767 8.164624 8.183834 #> [57] 8.177253 8.178063 8.149822 8.152051 8.171897 8.156803 8.181463 8.168415 #> [65] 8.163356 8.159418 8.174528 8.180864 8.169265 8.176029 8.195466 8.176538 #> [73] 8.166471 8.180492 8.168597 8.170262 8.180743 8.175697 8.173597 8.178742 #> [81] 8.164570 8.168137 8.175971 8.174829 8.157045 8.174294 8.169496 8.156630 #> [89] 8.165271 8.164037 8.170882 8.158499 8.175376 8.164344 8.169433 8.157970 #> [97] 8.160361 8.165938 8.167727 8.161714 #> #> $reduced_clonealign_fit #> A clonealign_fit for 200 cells, 80 genes, and 3 clones #> To access clone assignments, call x$clone #> To access ML parameter estimates, call x$ml_params #> $heldout_inds #> [1] 72 66 60 90 3 22 52 80 17 9 87 1 81 37 20 59 98 34 67 45 #> #> $kept_inds #> [1] 2 4 5 6 7 8 10 11 12 13 14 15 16 18 19 21 23 24 25 #> [20] 26 27 28 29 30 31 32 33 35 36 38 39 40 41 42 43 44 46 47 #> [39] 48 49 50 51 53 54 55 56 57 58 61 62 63 64 65 68 69 70 71 #> [58] 73 74 75 76 77 78 79 82 83 84 85 86 88 89 91 92 93 94 95 #> [77] 96 97 99 100 #> #> $heldout_observed_mse #> [1] 5.060314 #> #> $heldout_null_mse #> [1] 5.047976 5.079231 5.045589 5.072601 5.038944 5.068903 5.059268 5.018302 #> [9] 5.076334 5.085652 5.092805 5.054970 5.073584 5.085130 5.032874 5.030923 #> [17] 5.062821 5.058825 5.065252 5.069371 5.092889 5.066601 5.079692 5.064945 #> [25] 5.080774 5.075409 5.022453 5.050223 5.058505 5.083837 5.073288 5.063863 #> [33] 5.040052 5.089590 5.068245 5.045626 5.078114 5.020177 5.114059 5.075857 #> [41] 5.048223 5.061796 5.047791 5.048481 5.070281 5.045415 5.083511 5.088473 #> [49] 5.113912 5.052013 5.078123 5.051769 5.082680 5.061381 5.085314 5.071214 #> [57] 5.101640 5.079185 5.039494 5.071479 5.044071 5.095103 5.073446 5.049466 #> [65] 5.100744 5.066650 5.063223 5.062352 5.067454 5.039386 5.057637 5.065010 #> [73] 5.052644 5.049152 5.047871 5.072357 5.062026 5.054460 5.073883 5.060295 #> [81] 5.047926 5.087664 5.051152 5.066241 5.077961 5.068980 5.076766 5.074270 #> [89] 5.070654 5.079098 5.041946 5.083532 5.048877 5.100982 5.036633 5.074286 #> [97] 5.050818 5.078271 5.076683 5.064705 #>