evaluate_clonealign.Rd
Use mean squared error of predicting the expression of held-out genes to evaluate the quality of a clonealign fit.
evaluate_clonealign(gene_expression_data, copy_number_data, clonealign_fit, prop_holdout = 0.2, n_samples = 100, s = NULL, ...)
gene_expression_data | Either a |
---|---|
copy_number_data | A gene-by-clone matrix of copy numbers, the same as the input to |
clonealign_fit | The fit returned by a call to |
prop_holdout | The proportion of genes to hold out as a test set for predicting gene expression |
n_samples | The number of random permutations to establish a null distribution of predictive performance |
s | Vector of cell size factors - defaults to the total counts per cell |
... | Additional arguments to pass to the |
A list object describing the evaluation results. See details
This evaluation function is built around the idea of how good predicted expression is under the model given the inferred (assigned) clones compared to how well you could predict expression given a random clonal assignment.
Evaluations performed
On the full dataset, the mean square error is compared to the randomized error, and a message printed about the ratio of the two errors and the proportion of time the observed error was less than the error under a null distribution. If either the error under the null is smaller than the observed, or is smaller some percentage of time, then the fit has real problems and should be abandoned as it suggests the model is stuck in a local maxima (which could happen if the proposed clones aren't actually present in the RNA-seq).
A certain proportion of genes (as decided by the prop_holdout
parameter) are held out as a test
set, and the clonealign fit is re-performed on the 1 - prop_holdout
proportion of genes. The function will then
print an agreement table of clone assignments between the full table and the reduced table. If these vastly disagree
for only a small change in gene set (ie prop_holdout
is around 0.1 or 0.2), then the fit may be unreliable
as it is sensitive to the genes inputted.
The same metrics from (1) are then printed where the evaluations are performed on the heldout set. Again, if the observed mean squared error given the clonealign fit isn't less than the null mean squared error a large proportion of the time, the fit may be unreliable.
Object returned
Everything computed above is returned in a list object with the following entries:
full_clonealign_fit - the original clonealign fit on the full gene set that was passed in as the
clonealign_fit
argument
full_observed_mse - the observed mean square error using the full gene set
full_null_mse - A vector of mean square error under the randomized (permuted) distribution
reduced_clonealign_fit - a clonealign fit on only the reduced (train) set of genes
heldout_inds - the indices of the genes held out (test set) for evaluating predictive performance out-of-sample
kept_inds - the indices of retained genes as part of the reduced (train) set
heldout_observed_mse - the observed mean square error on the heldout (test) set of genes
heldout_null_mse - a vector of mean square errors under a null distribution of randomly permuting the clones
library(SingleCellExperiment) data(example_clonealign_fit) data(example_sce) copy_number_data <- rowData(example_sce)[,c("A", "B", "C")] evaluate_clonealign(example_sce, copy_number_data, example_clonealign_fit)#> On the full dataset, the observed MSE was on average 1.00068199101321 times smaller than under a null model and smaller 100% of the time#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#> Agreement between original (rows) and reduced (columns): #> A B C #> A 51 33 1 #> B 2 100 9 #> C 0 0 4 #> On the held-out dataset, the observed MSE was on average 1.00104532277777 times smaller than under a null model and smaller 100% of the time#> $full_clonealign_fit #> A clonealign_fit for 200 cells, 100 genes, and 3 clones #> To access clone assignments, call x$clone #> To access ML parameter estimates, call x$ml_params #> $full_observed_mse #> [1] 8.165564 #> #> $full_null_mse #> [1] 8.154310 8.166977 8.188727 8.181782 8.165159 8.167555 8.179075 8.172130 #> [9] 8.184131 8.165554 8.187043 8.180571 8.176795 8.171300 8.157375 8.175110 #> [17] 8.185641 8.179719 8.171748 8.192153 8.161636 8.155985 8.154671 8.171442 #> [25] 8.187066 8.178323 8.170458 8.165517 8.172348 8.159575 8.174497 8.168530 #> [33] 8.179797 8.163616 8.189642 8.171136 8.169682 8.176496 8.160318 8.173017 #> [41] 8.173295 8.167151 8.179799 8.173885 8.180725 8.175595 8.177047 8.152684 #> [49] 8.173981 8.186816 8.157725 8.179524 8.165856 8.165767 8.164624 8.183834 #> [57] 8.177253 8.178063 8.149822 8.152051 8.171897 8.156803 8.181463 8.168415 #> [65] 8.163356 8.159418 8.174528 8.180864 8.169265 8.176029 8.195466 8.176538 #> [73] 8.166471 8.180492 8.168597 8.170262 8.180743 8.175697 8.173597 8.178742 #> [81] 8.164570 8.168137 8.175971 8.174829 8.157045 8.174294 8.169496 8.156630 #> [89] 8.165271 8.164037 8.170882 8.158499 8.175376 8.164344 8.169433 8.157970 #> [97] 8.160361 8.165938 8.167727 8.161714 #> #> $reduced_clonealign_fit #> A clonealign_fit for 200 cells, 80 genes, and 3 clones #> To access clone assignments, call x$clone #> To access ML parameter estimates, call x$ml_params #> $heldout_inds #> [1] 72 66 60 90 3 22 52 80 17 9 87 1 81 37 20 59 98 34 67 45 #> #> $kept_inds #> [1] 2 4 5 6 7 8 10 11 12 13 14 15 16 18 19 21 23 24 25 #> [20] 26 27 28 29 30 31 32 33 35 36 38 39 40 41 42 43 44 46 47 #> [39] 48 49 50 51 53 54 55 56 57 58 61 62 63 64 65 68 69 70 71 #> [58] 73 74 75 76 77 78 79 82 83 84 85 86 88 89 91 92 93 94 95 #> [77] 96 97 99 100 #> #> $heldout_observed_mse #> [1] 5.060314 #> #> $heldout_null_mse #> [1] 5.047976 5.079231 5.045589 5.072601 5.038944 5.068903 5.059268 5.018302 #> [9] 5.076334 5.085652 5.092805 5.054970 5.073584 5.085130 5.032874 5.030923 #> [17] 5.062821 5.058825 5.065252 5.069371 5.092889 5.066601 5.079692 5.064945 #> [25] 5.080774 5.075409 5.022453 5.050223 5.058505 5.083837 5.073288 5.063863 #> [33] 5.040052 5.089590 5.068245 5.045626 5.078114 5.020177 5.114059 5.075857 #> [41] 5.048223 5.061796 5.047791 5.048481 5.070281 5.045415 5.083511 5.088473 #> [49] 5.113912 5.052013 5.078123 5.051769 5.082680 5.061381 5.085314 5.071214 #> [57] 5.101640 5.079185 5.039494 5.071479 5.044071 5.095103 5.073446 5.049466 #> [65] 5.100744 5.066650 5.063223 5.062352 5.067454 5.039386 5.057637 5.065010 #> [73] 5.052644 5.049152 5.047871 5.072357 5.062026 5.054460 5.073883 5.060295 #> [81] 5.047926 5.087664 5.051152 5.066241 5.077961 5.068980 5.076766 5.074270 #> [89] 5.070654 5.079098 5.041946 5.083532 5.048877 5.100982 5.036633 5.074286 #> [97] 5.050818 5.078271 5.076683 5.064705 #>