This function calculates the intraclass-correlation
(icc) - sometimes also called variance partition coefficient
(vpc) - for random intercepts of mixed effects models. Currently,
merMod
, glmmTMB
,
stanreg
and brmsfit
objects are supported.
icc(x, ..., posterior = FALSE)
x | Fitted mixed effects model (of class |
---|---|
... | More fitted model objects, to compute multiple intraclass-correlation coefficients at once. |
posterior | Logical, if |
If posterior = FALSE
(the default), a numeric vector with all
random intercept intraclass-correlation-coefficients, or a list of
numeric vectors, when more than one model were used as arguments.
Furthermore, between- and within-group variances as well as random-slope
variance are returned as attributes.
If posterior = TRUE
, icc()
returns a data frame with ICC
and variance components for each sample of the posterior distribution.
The ICC is calculated by dividing the between-group-variance (random
intercept variance) by the total variance (i.e. sum of between-group-variance
and within-group (residual) variance).
The calculation of the ICC for generalized linear mixed models with binary outcome is based on
Wu et al. (2012). For Poisson multilevel models, please refer to Stryhn et al. (2006).
Aly et al. (2014) describe computation of ICC for negative binomial models.
Caution: For models with random slopes and random intercepts,
the ICC would differ at each unit of the predictors. Hence, the ICC for these
kind of models cannot be understood simply as proportion of variance
(see Goldstein et al. 2010). For convenience reasons, as the
icc()
function also extracts the different random effects
variances, the ICC for random-slope-intercept-models is reported
nonetheless, but it is usually no meaningful summary of the
proportion of variances.
If posterior = FALSE
, there is a print()
-method that prints
the variance parameters using the comp
-argument set to "var"
:
print(x, comp = "var")
(see 'Examples'). The
re_var
-function is a convenient wrapper. If
posterior = TRUE
, the print()
-method accepts the arguments
prob
and digits
, which indicate the probability of the
uncertainty interval for the ICC and variance components, and the digits
in the output (see also 'Examples').
The random effect variances indicate the between- and within-group
variances as well as random-slope variance and random-slope-intercept
correlation. The components are denoted as following:
Within-group (residual) variance: sigma_2
Between-group-variance: tau.00 (variation between individual intercepts and average intercept)
Random-slope-variance: tau.11 (variation between individual slopes and average slope)
Random-Intercept-Slope-covariance: tau.01
Random-Intercept-Slope-correlation: rho.01
Some notes on why the ICC is useful, based on Grace-Martin:
It can help you determine whether or not a linear mixed model is even necessary. If you find that the correlation is zero, that means the observations within clusters are no more similar than observations from different clusters. Go ahead and use a simpler analysis technique.
It can be theoretically meaningful to understand how much of the overall variation in the response is explained simply by clustering. For example, in a repeated measures psychological study you can tell to what extent mood is a trait (varies among people, but not within a person on different occasions) or state (varies little on average among people, but varies a lot across occasions).
It can also be meaningful to see how the ICC (as well as the between and within cluster variances) changes as variable are added to the model.
In short, the ICC can be interpreted as “the proportion of the variance
explained by the grouping structure in the population” (Hox 2002: 15).
Usually, the ICC is calculated for the null model ("unconditional model").
However, according to Raudenbush and Bryk (2002) or
Rabe-Hesketh and Skrondal (2012) it is also feasible to compute the ICC
for full models with covariates ("conditional models") and compare how
much a level-2 variable explains the portion of variation in the grouping
structure (random intercept).
Caution: For three-level-models, depending on the nested structure
of the model, the ICC only reports the proportion of variance explained
for each grouping level. However, the proportion of variance for specific
levels related to each other (e.g., similarity of level-1-units within
level-2-units or level-2-units within level-3-units) must be computed
manually. Use get_re_var
to get the between-group-variances
and residual variance of the model, and calculate the ICC for the various level
correlations.
For example, for the ICC between level 1 and 2:
sum(get_re_var(fit)) / (sum(get_re_var(fit)) + get_re_var(fit, "sigma_2"))
or for the ICC between level 2 and 3:
get_re_var(fit)[2] / sum(get_re_var(fit))
Aguinis H, Gottfredson RK, Culpepper SA. 2013. Best-Practice Recommendations for Estimating Cross-Level Interaction Effects Using Multilevel Modeling. Journal of Management 39(6): 1490–1528 (doi: 10.1177/0149206313478188 )
Aly SS, Zhao J, Li B, Jiang J. 2014. Reliability of environmental sampling culture results using the negative binomial intraclass correlation coefficient. Springerplus 14(3) (doi: 10.1186/2193-1801-3-40 )
Goldstein H, Browne W, Rasbash J. 2010. Partitioning Variation in Multilevel Models. Understanding Statistics, 1:4, 223-231 (doi: 10.1207/S15328031US0104_02 )
Grace-Martion K. The Intraclass Correlation Coefficient in Mixed Models, web
Hox J. 2002. Multilevel analysis: techniques and applications. Mahwah, NJ: Erlbaum
Rabe-Hesketh S, Skrondal A. 2012. Multilevel and longitudinal modeling using Stata. 3rd ed. College Station, Tex: Stata Press Publication
Raudenbush SW, Bryk AS. 2002. Hierarchical linear models: applications and data analysis methods. 2nd ed. Thousand Oaks: Sage Publications
Stryhn H, Sanchez J, Morley P, Booker C, Dohoo IR. 2006. Interpretation of variance parameters in multilevel Poisson regression models. Proceedings of the 11th International Symposium on Veterinary Epidemiology and Economics, 2006 Available at http://www.sciquest.org.nz/node/64294
Wu S, Crespi CM, Wong WK. 2012. Comparison of methods for estimating the intraclass correlation coefficient for binary responses in cancer prevention cluster randomized trials. Contempory Clinical Trials 33: 869-880 (doi: 10.1016/j.cct.2012.05.004 )
Further helpful online-ressources:
CrossValidated (2012) Intraclass correlation (ICC) for an interaction?
CrossValidated (2014) Interpreting the random effect in a mixed-effect model
CrossValidated (2014) how to partition the variance explained at group level and individual level
#> #> Linear mixed model #> Family: gaussian (identity) #> Formula: Reaction ~ 1 + (1 | Subject) #> #> ICC (Subject): 0.394890# note: ICC for random-slope-intercept model usually not # meaningful - see 'Note'. fit1 <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy) icc(fit1)#>#> #> Linear mixed model #> Family: gaussian (identity) #> Formula: Reaction ~ Days + (Days | Subject) #> #> ICC (Subject): 0.483090sleepstudy$mygrp <- sample(1:45, size = 180, replace = TRUE) fit2 <- lmer(Reaction ~ Days + (1 | mygrp) + (1 | Subject), sleepstudy) icc(fit2)#> #> Linear mixed model #> Family: gaussian (identity) #> Formula: Reaction ~ Days + (1 | mygrp) + (1 | Subject) #> #> ICC (mygrp): 0.006013 #> ICC (Subject): 0.589883# return icc for all models at once icc(fit0, fit1, fit2)#>#> [[1]] #> #> Linear mixed model #> Family: gaussian (identity) #> Formula: Reaction ~ 1 + (1 | Subject) #> #> ICC (Subject): 0.394890 #> #> [[2]] #> #> Linear mixed model #> Family: gaussian (identity) #> Formula: Reaction ~ Days + (Days | Subject) #> #> ICC (Subject): 0.483090 #> #> [[3]] #> #> Linear mixed model #> Family: gaussian (identity) #> Formula: Reaction ~ Days + (1 | mygrp) + (1 | Subject) #> #> ICC (mygrp): 0.006013 #> ICC (Subject): 0.589883 #>icc1 <- icc(fit1)#>icc2 <- icc(fit2) print(icc1, comp = "var")#> #> Linear mixed model #> Family: gaussian (identity) #> Formula: Reaction ~ Days + (Days | Subject) #> #> Within-group-variance: 654.941 #> Between-group-variance: 612.090 (Subject) #> Random-slope-variance: 35.072 (Subject.Days) #> Slope-Intercept-covariance: 9.604 (Subject) #> Slope-Intercept-correlation: 0.066 (Subject)print(icc2, comp = "var")#> #> Linear mixed model #> Family: gaussian (identity) #> Formula: Reaction ~ Days + (1 | mygrp) + (1 | Subject) #> #> Within-group-variance: 946.474 #> Between-group-variance: 14.082 (mygrp) #> Between-group-variance: 1381.596 (Subject)# NOT RUN { # compute ICC for Bayesian mixed model, with an ICC for each # sample of the posterior. The print()-method then shows # the median ICC as well as 89% HDI for the ICC. # Change interval with print-method: # print(icc(m, posterior = TRUE), prob = .5) if (requireNamespace("brms", quietly = TRUE)) { library(dplyr) sleepstudy$mygrp <- sample(1:5, size = 180, replace = TRUE) sleepstudy <- sleepstudy %>% group_by(mygrp) %>% mutate(mysubgrp = sample(1:30, size = n(), replace = TRUE)) m <- brms::brm( Reaction ~ Days + (1 | mygrp / mysubgrp) + (1 | Subject), data = sleepstudy ) # by default, 89% interval icc(m, posterior = TRUE) # show 50% interval print(icc(m, posterior = TRUE), prob = .5, digits = 3) } # }