The ComBat Family extends the original ComBat methodology to enable flexible covariate modeling, leveraging efficient R implementations of regression models. A method that belongs in the ComBat Family satisfies the following conditions:
Denote data \(y_{ijv}\) and covariates \(\boldsymbol{x}_{ij}\) for sites \(i = 1,2,\ldots,M\), subjects within site \(j = 1,2,\ldots,n_i\), and features \(v = 1,2,\ldots,p\). The ComBat Family assumes
\[y_{ijv} \sim D(\mu_{ijv} + \gamma_{iv}, \delta_{iv}\sigma_{ijv})\] where \(D\) is a specified distribution (typically normal), \(\gamma_{iv}\) is the additive location batch effect and \(\delta_{iv}\) is the multiplicative scale batch effect. The mean and variance of \(D\) are modeled via
\[g_1(\mu_{ijv}) = f_v(\boldsymbol{x}_{ij}) \qquad g_2(\sigma_{ijv}) = h_v(\boldsymbol{x}_{ij})\]
where \(g_1, g_2\) are link functions and \(f_v, h_v\) are specified functions.
ComBat Family defaults to the original linear model used in ComBat,
using formula to specify covariates included in the
model.
suppressPackageStartupMessages({
library(neuroCombat)
library(ComBatFamily)
})
# generate toy dataset
set.seed(8888)
n <- 20
p <- 5
bat <- as.factor(c(rep("a", n/2), rep("b", n/2)))
q <- 2
covar <- matrix(rnorm(n*q), n, q)
colnames(covar) <- paste0("x", 1:q)
data <- data.frame(matrix(rnorm(n*p), n, p))
cf <- comfam(data, bat, covar, lm, formula = y ~ x1 + x2)
c <- neuroCombat(t(data), bat, covar, verbose = FALSE)
max(cf$dat.combat - t(c$dat.combat))
#> [1] 1.332268e-15ComBat-GAM (Pomponio et al., 2020) was previously only available in
Python. ComBat Family enables fitting of a general additive model (GAM)
using notation consistent with the gam function.
ComBatLS (Gardner et al.) is recent location- and scale-preserving method for multi-site image harmonization. ComBatLS models covariate effects in the location and scale of measurements via GAMLSS. ComBat-LS should be considered in settings with potential covariate effects on scale (e.g. age and sex affecting the variance of imaging features).
suppressPackageStartupMessages(
library(gamlss)
)
cgl <- combatls(data, bat, covar, y ~ x1 + x2,
sigma.formula = ~ x1 + x2,
control = gamlss.control(trace = FALSE))
# # alternate syntax
# cgl <- comfam(data, bat, covar, gamlss, y ~ x1 + x2,
# sigma.formula = ~ x1 + x2,
# control = gamlss.control(trace = FALSE))The ComBat family includes Longitudinal ComBat (Beer et al. 2020),
previously implemented here. Currently,
comfam uses the mean squared residual (MSR) estimator for
error variance, but the REML estimator will be implemented at a later
date.
# devtools::install_github("jcbeer/longCombat")
suppressPackageStartupMessages({
library(longCombat)
library(lme4)
})
# generate toy dataset
n <- 20
p <- 5
r <- 2 # repeats
bat <- as.factor(rep(c(rep("a", n/2), rep("b", n/2)), r))
q <- 2
covar <- matrix(rnorm(n*r*q), n*r, q)
colnames(covar) <- paste0("x", 1:q)
covar <- cbind(covar, ID = rep(1:n, r))
data <- data.frame(matrix(rnorm(n*r*p), n*r, p))
cf <- long_combat(data, bat, covar, y ~ x1 + x2 + (1 | ID))
# # alternate syntax
# cf <- comfam(data, bat, covar, lmer, y ~ x1 + x2 + (1 | ID))
c <- longCombat("ID", "x1", "bat", 1:p, "x1 + x2", "(1 | ID)",
data = cbind(data, covar, bat), method = "MSR", verbose = FALSE)
max(cf$dat.combat - c$data_combat[,-(1:3)])
#> [1] 4.171635e-09Beer, J. C., Tustison, N. J., Cook, P. A., Davatzikos, C., Sheline, Y. I., Shinohara, R. T., & Linn, K. A. (2020). Longitudinal ComBat: A method for harmonizing longitudinal multi-scanner imaging data. NeuroImage, 220, 117129. https://doi.org/10.1016/j.neuroimage.2020.117129
Johnson, W. E., Li, C., & Rabinovic, A. (2007). Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics, 8(1), 118–127. https://doi.org/10.1093/biostatistics/kxj037
Gardner, M., Shinohara, R. T., Bethlehem, R. A. I., Romero-Garcia, R., Warrier, V., Dorfschmidt, L., Shanmugan, S., Seidlitz, J., Alexander-Bloch, A., & Chen, A. A. (2024). ComBatLS: A location- and scale-preserving method for multi-site image harmonization. bioRxiv, 2024.06.21.599875. https://doi.org/10.1101/2024.06.21.599875
Pomponio, R., Erus, G., Habes, M., Doshi, J., Srinivasan, D., Mamourian, E., Bashyam, V., Nasrallah, I. M., Satterthwaite, T. D., Fan, Y., Launer, L. J., Masters, C. L., Maruff, P., Zhuo, C., Völzke, H., Johnson, S. C., Fripp, J., Koutsouleris, N., Wolf, D. H., … Davatzikos, C. (2020). Harmonization of large MRI datasets for the analysis of brain imaging patterns throughout the lifespan. NeuroImage, 208, 116450. https://doi.org/10.1016/j.neuroimage.2019.116450