ggpredict()
computes predicted (fitted) values for the
response, at the margin of specific values from certain model terms,
where additional model terms indicate the grouping structure.
ggaverage()
computes the average predicted values.
The result is returned as tidy data frame.
mem()
is an alias for ggpredict()
(marginal effects
at the mean), ame()
is an alias for ggaverage()
(average marginal effects).
ggaverage(model, terms, ci.lvl = 0.95, type = c("fe", "re"), typical = "mean", ppd = FALSE, ...) ame(model, terms, ci.lvl = 0.95, type = c("fe", "re"), typical = "mean", ppd = FALSE, ...) ggpredict(model, terms, ci.lvl = 0.95, type = c("fe", "re"), full.data = FALSE, typical = "mean", ppd = FALSE, x.as.factor = FALSE, pretty = TRUE, ...) mem(model, terms, ci.lvl = 0.95, type = c("fe", "re"), full.data = FALSE, typical = "mean", ppd = FALSE, x.as.factor = FALSE, ...)
model | A fitted model object, or a list of model objects. Any model
that supports common methods like |
---|---|
terms | Character vector with the names of those terms from |
ci.lvl | Numeric, the level of the confidence intervals. For |
type | Character, only applies for mixed effects models. Indicates
whether predicted values should be conditioned on random effects
( |
typical | Character vector, naming the function to be applied to the
covariates over which the effect is "averaged". The default is "mean".
See |
ppd | Logical, if |
... | Further arguments passed down to |
full.data | Logical, if |
x.as.factor | Logical, if |
pretty | Logical, if |
A tibble (with ggeffects
class attribute) with consistent data columns:
x
the values of the first term in terms
, used as x-position in plots.
predicted
the predicted values of the response, used as y-position in plots.
conf.low
the lower bound of the confidence interval for the predicted values.
conf.high
the upper bound of the confidence interval for the predicted values.
observed
if full.data = TRUE
, this columns contains the observed values (the response vector).
residuals
if full.data = TRUE
, this columns contains residuals.
group
the grouping level from the second term in terms
, used as grouping-aesthetics in plots.
facet
the grouping level from the third term in terms
, used to indicate facets in plots.
For proportional odds logistic regression (see polr
)
resp. cumulative link models (e.g., see clm
),
an additional column response.level
is returned, which indicates
the grouping of predictions based on the level of the model's response.
Currently supported model-objects are: lm, glm, glm.nb, lme, lmer,
glmer, glmer.nb, nlmer, glmmTMB, gam, vgam, gamm, gamm4, multinom,
betareg, gls, gee, plm, lrm, polr, clm, hurdle, zeroinfl, svyglm,
svyglm.nb, truncreg, coxph, stanreg, brmsfit
.
Other models not listed here are passed to a generic predict-function
and might work as well, or maybe with ggeffect()
, which
effectively does the same as ggpredict()
. The main difference
between ggpredict()
and ggeffect()
is how factors are
held constant: ggpredict()
uses the reference level, while
ggeffect()
computes a kind of "average" value, which represents
the proportions of each factor's category.
For ggpredict()
, if full.data = FALSE
, expand.grid()
is called on all unique combinations of model.frame(model)[, terms]
and used as newdata
-argument for predict()
. In this case,
all remaining covariates that are not specified in terms
are
held constant. Numeric values are set to the mean (unless changed
with the typical
-argument), factors are set to their
reference level and character vectors to their mode (most common
element).
ggaverage()
computes the average predicted values, by calling
ggpredict()
with full.data = TRUE
, where argument
newdata = model.frame(model)
is used in predict()
.
Hence, predictions are made on the model data. In this case, all
remaining covariates that are not specified in terms
are
not held constant, but vary between observations (and are
kept as they happen to be). The predicted values are then averaged
for each group (if any).
Thus, ggpredict()
can be considered as calculating marginal
effects at the mean, while ggaverage()
computes average
marginal effects.
ggpredict()
also works with Stan-models from
the rstanarm or brms-package. The predicted
values are the median value of all drawn posterior samples. The
confidence intervals for Stan-models are actually high density
intervals, computed by hdi
, unless ppd = TRUE
.
If ppd = TRUE
, predictions are based on draws of the posterior
predictive distribution and the uncertainty interval is computed
using predictive_interval
. By default (i.e.
ppd = FALSE
), the predictions are based on
posterior_linpred
and hence have some
limitations: the uncertainty of the error term is not taken into
account. The recommendation is to use the posterior predictive
distribution (posterior_predict
).
Note that for binomial models, the newdata
-argument
used in posterior_predict()
must also contain the vector
with the number of trials. In this case, a dummy-vector is used,
where all values for the response are set to 1.
Since data for ggaverage()
comes from the model frame, not all
possible combinations of values in terms
might be present in the data,
thus lines or confidence bands from plot()
might not span over
the complete x-axis-range.
There are some limitations for certain model objects. For example,
it is currently only possible to compute predicted risk scores for
coxph
-models, but not expected number of events nor survival
probabilities.
polr
- or clm
-models have an additional column
response.level
, which indicates with which level of the response
variable the predicted values are associated.
data(efc) fit <- lm(barthtot ~ c12hour + neg_c_7 + c161sex + c172code, data = efc) ggpredict(fit, terms = "c12hour")#>#> # A tibble: 18 x 5 #> x predicted conf.low conf.high group #> <dbl> <dbl> <dbl> <dbl> <fct> #> 1 0 75.4 73.3 77.6 1 #> 2 10 72.9 70.9 74.9 1 #> 3 20 70.4 68.6 72.2 1 #> 4 30 67.8 66.1 69.5 1 #> 5 40 65.3 63.7 67.0 1 #> 6 50 62.8 61.1 64.4 1 #> 7 60 60.2 58.5 62.0 1 #> 8 70 57.7 55.8 59.6 1 #> 9 80 55.2 53.1 57.3 1 #> 10 90 52.6 50.3 55.0 1 #> 11 100 50.1 47.5 52.7 1 #> 12 110 47.6 44.7 50.4 1 #> 13 120 45.0 41.9 48.2 1 #> 14 130 42.5 39.1 45.9 1 #> 15 140 40.0 36.3 43.7 1 #> 16 150 37.4 33.4 41.5 1 #> 17 160 34.9 30.6 39.2 1 #> 18 170 32.4 27.7 37.0 1ggpredict(fit, terms = "c12hour", full.data = TRUE)#> # A tibble: 815 x 7 #> x predicted conf.low conf.high observed residuals group #> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <fct> #> 1 4 78.2 74.7 81.7 80 1.80 1 #> 2 4 84.8 80.9 88.6 75 -9.77 1 #> 3 4 78.6 74.3 82.9 75 -3.60 1 #> 4 4 83.2 78.7 87.7 100 16.8 1 #> 5 4 74.0 69.7 78.3 95 21.0 1 #> 6 4 74.3 72.0 76.7 15 -59.3 1 #> 7 4 69.7 67.1 72.3 50 -19.7 1 #> 8 4 65.1 62.0 68.3 95 29.9 1 #> 9 4 71.7 67.3 76.1 95 23.3 1 #> 10 4 74.3 72.0 76.7 80 5.67 1 #> # ... with 805 more rowsggpredict(fit, terms = c("c12hour", "c172code"))#>#> # A tibble: 54 x 5 #> x predicted conf.low conf.high group #> <dbl> <dbl> <dbl> <dbl> <fct> #> 1 0 74.7 71.3 78.2 low level of education #> 2 0 75.5 73.3 77.6 intermediate level of education #> 3 0 76.2 72.8 79.5 high level of education #> 4 10 72.2 68.9 75.5 low level of education #> 5 10 72.9 71.0 74.9 intermediate level of education #> 6 10 73.7 70.4 76.9 high level of education #> 7 20 69.7 66.5 72.9 low level of education #> 8 20 70.4 68.6 72.2 intermediate level of education #> 9 20 71.1 67.9 74.3 high level of education #> 10 30 67.1 64.0 70.3 low level of education #> # ... with 44 more rowsggpredict(fit, terms = c("c12hour", "c172code", "c161sex"))#>#> # A tibble: 108 x 6 #> x predicted conf.low conf.high group facet #> <dbl> <dbl> <dbl> <dbl> <fct> <fct> #> 1 0 74.0 69.4 78.6 low level of education [1] Male #> 2 0 75.0 71.4 78.6 low level of education [2] Fema~ #> 3 0 74.7 71.1 78.3 intermediate level of education [1] Male #> 4 0 75.7 73.3 78.1 intermediate level of education [2] Fema~ #> 5 0 75.4 71.0 79.7 high level of education [1] Male #> 6 0 76.4 72.9 80.0 high level of education [2] Fema~ #> 7 10 71.4 66.9 75.9 low level of education [1] Male #> 8 10 72.5 69.0 75.9 low level of education [2] Fema~ #> 9 10 72.1 68.6 75.7 intermediate level of education [1] Male #> 10 10 73.2 71.0 75.4 intermediate level of education [2] Fema~ #> # ... with 98 more rows# only range of 40 to 60 for variable 'c12hour' ggpredict(fit, terms = "c12hour [40:60]")#> # A tibble: 21 x 5 #> x predicted conf.low conf.high group #> <dbl> <dbl> <dbl> <dbl> <fct> #> 1 40 65.3 63.7 67.0 1 #> 2 41 65.1 63.4 66.7 1 #> 3 42 64.8 63.2 66.5 1 #> 4 43 64.6 62.9 66.2 1 #> 5 44 64.3 62.6 65.9 1 #> 6 45 64.0 62.4 65.7 1 #> 7 46 63.8 62.1 65.4 1 #> 8 47 63.5 61.9 65.2 1 #> 9 48 63.3 61.6 64.9 1 #> 10 49 63.0 61.4 64.7 1 #> # ... with 11 more rows# to plot ggeffects-objects, you can use the 'plot()'-function. # the following examples show how to build your ggplot by hand. # plot predicted values, remaining covariates held constant library(ggplot2) mydf <- ggpredict(fit, terms = "c12hour")#>ggplot(mydf, aes(x, predicted)) + geom_line() + geom_ribbon(aes(ymin = conf.low, ymax = conf.high), alpha = .1)# with "full.data = TRUE", remaining covariates vary between # observations, so fitted values can be plotted mydf <- ggpredict(fit, terms = "c12hour", full.data = TRUE) ggplot(mydf, aes(x, predicted)) + geom_point()# you can add a smoothing-geom to show the linear trend of fitted values ggplot(mydf, aes(x, predicted)) + geom_smooth(method = "lm", se = FALSE) + geom_point()# three variables, so we can use facets and groups mydf <- ggpredict( fit, terms = c("c12hour", "c161sex", "c172code"), full.data = TRUE ) ggplot(mydf, aes(x = x, y = predicted, colour = group)) + stat_smooth(method = "lm", se = FALSE) + facet_wrap(~facet, ncol = 2)# average marginal effects mydf <- ggaverage(fit, terms = c("c12hour", "c172code")) ggplot(mydf, aes(x = x, y = predicted, colour = group)) + stat_smooth(method = "lm", se = FALSE)# select specific levels for grouping terms mydf <- ggpredict(fit, terms = c("c12hour", "c172code [1,3]", "c161sex"))#>ggplot(mydf, aes(x = x, y = predicted, colour = group)) + stat_smooth(method = "lm", se = FALSE) + facet_wrap(~facet) + labs( y = get_y_title(mydf), x = get_x_title(mydf), colour = get_legend_title(mydf) )# level indication also works for factors with non-numeric levels # and in combination with numeric levels for other variables library(sjlabelled) data(efc) efc$c172code <- as_label(efc$c172code) fit <- lm(barthtot ~ c12hour + neg_c_7 + c161sex + c172code, data = efc) ggpredict(fit, terms = c("c12hour", "c172code [low level of education, high level of education]", "c161sex [1]"))#>#> # A tibble: 36 x 6 #> x predicted conf.low conf.high group facet #> <dbl> <dbl> <dbl> <dbl> <fct> <fct> #> 1 0 72.8 67.9 77.7 low level of education [1] Male #> 2 0 74.0 69.2 78.8 high level of education [1] Male #> 3 10 70.3 65.5 75.1 low level of education [1] Male #> 4 10 71.5 66.8 76.2 high level of education [1] Male #> 5 20 67.7 63.0 72.5 low level of education [1] Male #> 6 20 69.0 64.3 73.7 high level of education [1] Male #> 7 30 65.2 60.5 69.9 low level of education [1] Male #> 8 30 66.4 61.7 71.1 high level of education [1] Male #> 9 40 62.7 58.0 67.3 low level of education [1] Male #> 10 40 63.9 59.2 68.6 high level of education [1] Male #> # ... with 26 more rows# use categorical value on x-axis, use axis-labels, add error bars dat <- ggpredict(fit, terms = c("c172code", "c161sex")) ggplot(dat, aes(x, predicted, colour = group)) + geom_point(position = position_dodge(.1)) + geom_errorbar( aes(ymin = conf.low, ymax = conf.high), position = position_dodge(.1) ) + scale_x_continuous(breaks = 1:3, labels = get_x_labels(dat))# 3-way-interaction with 2 continuous variables data(efc) # make categorical efc$c161sex <- as_factor(efc$c161sex) fit <- lm(neg_c_7 ~ c12hour * barthtot * c161sex, data = efc) # select only levels 30, 50 and 70 from continuous variable Barthel-Index dat <- ggpredict(fit, terms = c("c12hour", "barthtot [30,50,70]", "c161sex"))#>ggplot(dat, aes(x = x, y = predicted, colour = group)) + stat_smooth(method = "lm", se = FALSE, fullrange = TRUE) + facet_wrap(~facet) + labs( colour = get_legend_title(dat), x = get_x_title(dat), y = get_y_title(dat), title = get_title(dat) )# or with ggeffects' plot-method# NOT RUN { plot(dat, ci = FALSE) # }# use factor levels as x-column in returned data frame data(efc) efc$c161sex <- as_label(efc$c161sex) fit <- lm(neg_c_7 ~ c12hour + c161sex, data = efc) ggpredict(fit, terms = "c161sex", x.as.factor = TRUE)#> # A tibble: 2 x 5 #> x predicted conf.low conf.high group #> <fct> <dbl> <dbl> <dbl> <fct> #> 1 Male 11.5 11.0 12.0 1 #> 2 Female 12.0 11.7 12.2 1