This is short primer on the use of medianBootstrap for the analysis of cell-to-cell spread data, using an example from Cheval et al. (2020) under CC BY 4.0.
Currently, most of the literature uses a Mann-Whitney-Wilcox test to analyse GFP cell-to-cell spread data. Here, we propose a medianBootstrap method, which accounts for the varying shape and variance between control and treatment/genotype conditions.
## Load functions from medianBootstrap.R
# source(medianBootstrap.R)
# or
mcp_ci <- function(success, trials, alpha){
## Copyright (C) 2001 Frank E Harrell Jr
## Modified by Matthew G Johnston 2020 from binconf in the Hmisc package
## Distributed under GNU General Public License v2 (or later) without any warranty. See the GNU General Public License for more details.
zcrit <- - qnorm(alpha/2)
z2 <- zcrit * zcrit
mc_p <- success/trials
cl <- (mc_p + z2/2/trials + c(-1, 1) * zcrit *
sqrt((mc_p * (1 - mc_p) + z2/4/trials)/trials))/(1 + z2/trials)
if(success == 1)
cl[1] <- - log(1 - alpha)/trials
if(success == (trials - 1))
cl[2] <- 1 + log(1 - alpha)/trials
return(cbind(mc_p, lower_ci=cl[1], upper_ci=cl[2]))
}
medianBootstrap<- function(data1, data2, N=5000, alpha=0.05){
## Calculate observed test statistic
mediandiff<-median(data1)-median(data2)
## Generate the null distribution
boots<-replicate(N, median(sample(data1,length(data1), replace=T))-median(sample(data2,length(data2), replace=T))-mediandiff)
## Count the number of at resampled observations which are at least as extreme
above <- sum(abs(boots)>=abs(mediandiff))
## Calculate p value and confidence intervals
mcp<-mcp_ci(above+1,N+1, alpha)
return(mcp)
}
## Load example data from Cheval et al. (2020)
library(readxl)
ChevalCount <- read_excel("RawData.xlsx",
sheet = 1, col_names = T)
## Extract the relevant data into two vectors and plot them
Mock <- ChevalCount[ChevalCount$treatment=="Mock",]$movement
Chitin <- ChevalCount[ChevalCount$treatment=="Chitin",]$movement
par(mfrow=c(1,2))
hist(Mock)
hist(Chitin)
## Run the statistical test
medianBootstrap(Mock,Chitin, N=5000, alpha=0.05)
## mc_p lower_ci upper_ci
## [1,] 0.00019996 1.025661e-05 0.001131863
We can see here, that the two distributions are remarkably different. By using further bootstraps, we can refine the Monte Carlo p value and reduce the confidence intervals.
## Run the statistical test with more bootstraps
medianBootstrap(Mock,Chitin, N=25000, alpha=0.05)
## mc_p lower_ci upper_ci
## [1,] 3.99984e-05 2.05165e-06 0.0002265524
Instead of using Wilson (1927) confidence intervals, a bootstrap 95% confidence interval can be constructed:
## Generate bootstraped 95% confidence intervals
medianBootstrap_once<- function(data1, data2, N=5000, alpha=0.05){
## Calculate observed test statistic
mediandiff<-median(data1)-median(data2)
## Generate the null distribution
boots<-replicate(N, median(sample(data1,length(data1), replace=T))-median(sample(data2,length(data2), replace=T))-mediandiff)
## Count the number of at resampled observations which are at least as extreme
above <- sum(abs(boots)>=abs(mediandiff))
## Calculate p value
mcp<-(above+1)/(N+1)
return(mcp)
}
pvalue<-medianBootstrap_once(Mock,Chitin, N=5000, alpha=0.05)
interval<-replicate(100,medianBootstrap_once(Mock,Chitin, N=5000, alpha=0.05))
paste("p =", format(round(pvalue,5),nsmall=5), ", 95% CI = [", format(round(quantile(interval, c(0.025)),5),nsmall=5),",",format(round(quantile(interval, c(0.975)),5),nsmall=5),"]")
## [1] "p = 2e-04 , 95% CI = [ 2e-04 , 4e-04 ]"
Alternatively, Clopper-Pearson or Agresti–Coull could be used. The Hmisc package provides functions for this.
You may be interested to also view the null distribution generated by the medianBootstrap script. I provide an example function for this here.
library(ggplot2)
medianBootstrap_plot<- function(data1, data2, N=5000, alpha=0.05){
## Calculate observed test statistic
mediandiff<-median(data1)-median(data2)
## Generate the null distribution
boots<-replicate(N, median(sample(data1,length(data1), replace=T))-median(sample(data2,length(data2), replace=T))-mediandiff)
## Count the number of at resampled observations which are at least as extreme
above <- sum(abs(boots)>=abs(mediandiff))
## Calculate p value and confidence intervals
mcp<-format(round(mcp_ci(above+1,N+1, alpha),3),nsmall=3)
## Plot graph
labeltext1<- substitute(atop(paste(hat(italic("p")))~phantom()==phantom()~AAA,"95% CI"~BBB~CCC), list(AAA=mcp[1], BBB =paste0("[",mcp[2],", "), CCC= paste0(mcp[3],"]")))
plot<-ggplot(data.frame(median=boots), aes(x=abs(median)))+
geom_histogram(alpha=.7, binwidth = 1, aes(y=..density..))+
geom_vline(xintercept = abs(mediandiff), linetype=2, colour = "red4")+
theme_bw()+
xlab(expression(atop("|"~hat(paste(theta, "*"))~-~hat(theta)~"|",theta==Delta[median])))+
ylab("Density")+
annotate("text",x = Inf, y = Inf, hjust = 1.1, vjust = 1.1,label=labeltext1)+
theme(text=element_text(size=15))
return(plot)
}
medianBootstrap_plot(Mock,Chitin, N=5000, alpha=0.05)
It is rare that we want to compare a control to only one treatment. Therefore, a method to control family-wise error rate is required. The below function will compare an unlimited number of vectors to the first and then perform a Holm correction (Holm, 1797).
medianBootstraps<- function(..., N=5000, alpha=0.05){
results<-NULL
x<-list(...)
reference<-x[[1]]
for(i in 2:(length(x))){
## Calculate observed test statistic
mediandiff<-median(reference)-median(x[[i]])
## Generate the null distribution
boots<-replicate(N, median(sample(reference,length(reference), replace=T))-median(sample(x[[i]],length(x[[i]]), replace=T))-mediandiff)
## Count the number of at resampled observations which are at least as extreme
above <- sum(abs(boots)>=abs(mediandiff))
## Calculate p value and confidence intervals
mcp<-mcp_ci(above+1,N+1, alpha)
results<-rbind(results, mcp)
}
## Calculate observed test statistics
return(cbind(pvaladj=p.adjust(results[,1]),pvaladj_lower=p.adjust(results[,2]),pvaladj_upper=p.adjust(results[,3])))
}
medianBootstraps(Mock,Chitin[1:100],Chitin[101:200],Chitin[201:300], N=5000, alpha=0.05)
## pvaladj pvaladj_lower pvaladj_upper
## [1,] 0.00799840 5.879505e-03 0.010872566
## [2,] 0.00279944 1.356394e-03 0.005773286
## [3,] 0.00059988 3.076982e-05 0.003395589
meanBootstrap<- function(data1, data2, N=5000, alpha=0.05){
## Calculate observed test statistic
meandiff<-mean(data1)-mean(data2)
## Generate the null distribution
boots<-replicate(N, mean(sample(data1,length(data1), replace=T))-mean(sample(data2,length(data2), replace=T))-meandiff)
## Count the number of at resampled observations which are at least as extreme
above <- sum(abs(boots)>=abs(meandiff))
## Calculate p value and confidence intervals
mcp<-mcp_ci(above+1,N+1, alpha)
return(mcp)
}
meanBootstrap(Mock,Chitin, N=5000, alpha=0.05)
## mc_p lower_ci upper_ci
## [1,] 0.00019996 1.025661e-05 0.001131863
To generate the null distribution, the function subtracts the observed median difference from the resampled median difference. This is done so as to correct the data so that the statistical test is conducted with H0 being true (that there is no difference in medians).
This a long hand way of doing this, where the datasets are recentred first to have their medians made the same (e.g. around 0). This is done below and compared to the method presented in the letter.
medianBootstrap_long_plot<- function(data1, data2, N=5000, alpha=0.05){
## Calculate observed test statistic
mediandiff<-median(data1)-median(data2)
## Make H0 true (make the median of both vectors 0)
new_data1 <- data1-median(data1)
new_data2 <- data2-median(data2)
## Generate the null distribution
boots<-replicate(N, median(sample(new_data1,length(new_data1), replace=T))-median(sample(new_data2,length(new_data2), replace=T)))
## Count the number of resampled observations which are at least as extreme
above <- sum(abs(boots)>=abs(mediandiff))
## Calculate p value and confidence intervals
mcp<-format(round(mcp_ci(above+1,N+1, alpha),3),nsmall=3)
## Plot the graph
labeltext1<- substitute(atop(paste(hat(italic("p")))~phantom()==phantom()~AAA,"95% CI"~BBB~CCC), list(AAA=mcp[1], BBB =paste0("[",mcp[2],", "), CCC= paste0(mcp[3],"]")))
plot<-ggplot(data.frame(median=boots), aes(x=abs(median)))+
geom_histogram(alpha=.7, binwidth = 1, aes(y=..density..))+
geom_vline(xintercept = abs(mediandiff), linetype=2, colour = "red4")+
theme_bw()+
xlab(expression(atop("|"~hat(paste(theta[centred], "*"))~"|",theta==Delta[median])))+
ylab("Density")+
annotate("text",x = Inf, y = Inf, hjust = 1.1, vjust = 1.1,label=labeltext1)+
theme(text=element_text(size=15))
return(plot)
}
set.seed(123)
original<-medianBootstrap_plot(Mock,Chitin, N=5000, alpha=0.05)+ggtitle("Subtract observed diff")
set.seed(123)
recentre<-medianBootstrap_long_plot(Mock,Chitin, N=5000, alpha=0.05)+ggtitle("Recentre first")
library(gridExtra)
grid.arrange(original,recentre, ncol=2)
C. Cheval et al., Chitin perception in plasmodesmata characterises submembrane immune signalling specificity in plants. PNAS, 117(17):9621–29, 2020.
S. Holm, A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics, 6:65–70, 1979.
E.B. Wilson, Probable Inference, the Law of Succession, and Statistical Inference. Journal of the American Statistical Association 22: 209–12, 1927.
devtools::session_info()
## - Session info ---------------------------------------------------------------
## setting value
## version R version 4.0.2 (2020-06-22)
## os Windows 10 x64
## system x86_64, mingw32
## ui RTerm
## language (EN)
## collate English_United Kingdom.1252
## ctype English_United Kingdom.1252
## tz Europe/London
## date 2020-12-07
##
## - Packages -------------------------------------------------------------------
## package * version date lib source
## assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.2)
## callr 3.5.1 2020-10-13 [1] CRAN (R 4.0.3)
## cellranger 1.1.0 2016-07-27 [1] CRAN (R 4.0.2)
## cli 2.2.0 2020-11-20 [1] CRAN (R 4.0.3)
## colorspace 2.0-0 2020-11-11 [1] CRAN (R 4.0.3)
## crayon 1.3.4 2017-09-16 [1] CRAN (R 4.0.2)
## desc 1.2.0 2018-05-01 [1] CRAN (R 4.0.2)
## devtools 2.3.2 2020-09-18 [1] CRAN (R 4.0.3)
## digest 0.6.27 2020-10-24 [1] CRAN (R 4.0.3)
## dplyr 1.0.2 2020-08-18 [1] CRAN (R 4.0.2)
## ellipsis 0.3.1 2020-05-15 [1] CRAN (R 4.0.2)
## evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.2)
## fansi 0.4.1 2020-01-08 [1] CRAN (R 4.0.2)
## farver 2.0.3 2020-01-16 [1] CRAN (R 4.0.2)
## fs 1.5.0 2020-07-31 [1] CRAN (R 4.0.2)
## generics 0.1.0 2020-10-31 [1] CRAN (R 4.0.3)
## ggplot2 * 3.3.2 2020-06-19 [1] CRAN (R 4.0.2)
## glue 1.4.2 2020-08-27 [1] CRAN (R 4.0.2)
## gridExtra * 2.3 2017-09-09 [1] CRAN (R 4.0.2)
## gtable 0.3.0 2019-03-25 [1] CRAN (R 4.0.2)
## htmltools 0.5.0 2020-06-16 [1] CRAN (R 4.0.2)
## knitr 1.30 2020-09-22 [1] CRAN (R 4.0.2)
## labeling 0.4.2 2020-10-20 [1] CRAN (R 4.0.3)
## lifecycle 0.2.0 2020-03-06 [1] CRAN (R 4.0.2)
## magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.0.3)
## memoise 1.1.0 2017-04-21 [1] CRAN (R 4.0.3)
## munsell 0.5.0 2018-06-12 [1] CRAN (R 4.0.2)
## pillar 1.4.7 2020-11-20 [1] CRAN (R 4.0.3)
## pkgbuild 1.1.0 2020-07-13 [1] CRAN (R 4.0.2)
## pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.0.2)
## pkgload 1.1.0 2020-05-29 [1] CRAN (R 4.0.2)
## prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.0.2)
## processx 3.4.5 2020-11-30 [1] CRAN (R 4.0.3)
## ps 1.4.0 2020-10-07 [1] CRAN (R 4.0.3)
## purrr 0.3.4 2020-04-17 [1] CRAN (R 4.0.2)
## R6 2.5.0 2020-10-28 [1] CRAN (R 4.0.3)
## Rcpp 1.0.5 2020-07-06 [1] CRAN (R 4.0.2)
## readxl * 1.3.1 2019-03-13 [1] CRAN (R 4.0.2)
## remotes 2.2.0 2020-07-21 [1] CRAN (R 4.0.3)
## rlang 0.4.7 2020-07-09 [1] CRAN (R 4.0.2)
## rmarkdown 2.5 2020-10-21 [1] CRAN (R 4.0.3)
## rprojroot 2.0.2 2020-11-15 [1] CRAN (R 4.0.3)
## scales 1.1.1 2020-05-11 [1] CRAN (R 4.0.2)
## sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.3)
## stringi 1.5.3 2020-09-09 [1] CRAN (R 4.0.2)
## stringr 1.4.0 2019-02-10 [1] CRAN (R 4.0.2)
## testthat 3.0.0 2020-10-31 [1] CRAN (R 4.0.3)
## tibble 3.0.3 2020-07-10 [1] CRAN (R 4.0.2)
## tidyselect 1.1.0 2020-05-11 [1] CRAN (R 4.0.2)
## usethis 1.6.3 2020-09-17 [1] CRAN (R 4.0.3)
## vctrs 0.3.4 2020-08-29 [1] CRAN (R 4.0.2)
## withr 2.3.0 2020-09-22 [1] CRAN (R 4.0.2)
## xfun 0.19 2020-10-30 [1] CRAN (R 4.0.3)
## yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.2)
##
## [1] C:/Users/mattj/Documents/R/win-library/4.0
## [2] C:/Program Files/R/R-4.0.2/library