CovBat Family Overview

Andrew Chen

2023-10-20

The CovBat Family extends the original CovBat methodology to enable flexible covariate modeling, leveraging efficient R implementations of regression models. A method that belongs in the CovBat Family satisfies the following conditions:

  1. Modeling of covariate effects in location and/or scale
  2. Batch effects in mean and covariance of measurements
  3. Empirical Bayes step for borrowing information across features
  4. Adjustment of multivariate covariate and batch effects through principal components (PCs) of ComBat residuals

CovBat

CovBat Family defaults to the original linear model and parameters used in CovBat

suppressPackageStartupMessages({
  library(CovBat)
  library(ComBatFamily)
})

# generate toy dataset
set.seed(8888)
n <- 20
p <- 5
bat <- as.factor(c(rep("a", n/2), rep("b", n/2)))
q <- 2
covar <- matrix(rnorm(n*q), n, q)
colnames(covar) <- paste0("x", 1:q)
data <- data.frame(matrix(rnorm(n*p), n, p))

cf <- covfam(data, bat, covar, lm, formula = y ~ x1 + x2)
c <- covbat(t(data), bat, covar)

max(cf$dat.covbat - t(c$dat.covbat))
#> [1] 3.108624e-15
# no covariates specified
cf <- covfam(data, bat)
c <- covbat(t(data), bat)

max(cf$dat.covbat - t(c$dat.covbat))
#> [1] 3.663736e-15

CovBat-GAM

Modeling covariates via a general additive model (GAM) is part of the CovBat family and can be easily implemented.

suppressPackageStartupMessages(
  library(mgcv)
)
cg <- covfam(data, bat, covar, gam, formula = y ~ s(x1) + x2)

CovBat with modeling of scores

CovBat Family extends the original CovBat method by enabling modeling of covariate effects in the principal component (PC) scores. The model is estimated separately from the model used in the standardization step.

# no covariates specified
cf <- covfam(data, bat, covar, formula = y ~ x1 + x2, score.model = lm,
             score.args = list(formula = y ~ x1 + x2))

References

Chen, A. A., Beer, J. C., Tustison, N. J., Cook, P. A., Shinohara, R. T., Shou, H., & Initiative, T. A. D. N. (2022). Mitigating site effects in covariance for machine learning in neuroimaging data. Human Brain Mapping, 43(4), 1179–1195. https://doi.org/10.1002/hbm.25688