Data and analysis supplement
Whalley, B, Solbrig, L, Kavanagh, DJ, May, J, Parkin, T, Jones, R, Andrade, J
Data and analysis supplement for: Functional Imagery Training versus Motivational Interviewing for Weight Loss: A randomised controlled trial of brief individual interventions for overweight and obesity.
This document provides supplementary information on the analysis in the main paper.
Tables and figures shown here are compiled from an RMarkdown source document, also provided, which provides a complete account of the analyses reported. For clarity some R output is suppressed in the compiled document.
Documents provided to support this supplement include:
data_supplement.html
(the compiled data supplement).
data_supplement.Rmd
(the code defining the analysis, which compiles to produce the supplement).
blind_data.csv
(the complete dataset required to reproduce the analysis)
costing e-tool.xlsx
(completed Public Health England costing tool)
R Dependencies
It’s suggested to use the R checkpoint library to install the relevant versions of requied R packages.
Alternatively, just run options("repos" = "https://mran.microsoft.com/snapshot/2018-05-01")
prior to installation to ensure consistent package versions.
Data setup
Dataframes in a suitable format for our analyses were created as follows:
bmi <- function(height, weight) weight / height^2
df <- readr::read_csv("blind_data.csv") %>%
filter(!is.na(group)) %>%
mutate(person = row_number()) %>%
mutate(female = gender == "f") %>%
select(-gender) %>%
mutate(
bmi1 = bmi(height, kg1),
bmi2 = bmi(height, kg2),
bmi3 = bmi(height, kg3)
) %>%
# These two lines were added after the analyses had been completed, allowing us to properly
# label tables and figures. All analyses were initially conducted blind to condition.
mutate(group = factor(ifelse(group == 1, "MI", "FIT"))) %>%
mutate(group = relevel(group, ref = "MI"))
patient.chars <- df %>%
select(person, age, female, height)
df_long <-
df %>%
select(
contains("kg"), contains("cm"), contains("bmi"), contains("gqol"),
person, group
) %>%
gather(variable, value, -person, -group) %>%
# regex to extract variable names and followup point from column names
mutate(
var = stringr::str_extract(variable, "\\w+\\D"),
time = as.numeric(stringr::str_extract(variable, "\\d"))
) %>%
reshape2::dcast(person + time + group ~ var) %>%
# make baseline scores
arrange(person, time) %>%
group_by(person) %>%
mutate(cm_base = first(cm), kg_base = first(kg), bmi_base = first(bmi)) %>%
ungroup() %>%
# center baselines for ease of interpretation
mutate_at(vars(contains("_base")), funs(scale(., center = T, scale = F))) %>%
mutate(time = factor(time, labels = c("Baseline", "Month 6", "Month 12"))) %>%
# merge age etc, which are constant per-participant, back to this long DF
left_join(., patient.chars)
# Version without baseline outcomes for primary analyses
df_long_nobase <- df_long %>%
filter(time != "Baseline")
Save intermediate data for later use:
write_csv(df, "intermediate-data/df.csv")
write_csv(df_long, "intermediate-data/df_long.csv")
For reference, print group means and N at each followup:
df_long %>%
group_by(time, group) %>%
summarise(mean(kg, na.rm = T), sd(kg, na.rm = T), sum(!is.na(kg))) %>%
pander()
Baseline |
MI |
89.13 |
14.76 |
58 |
Baseline |
FIT |
90.48 |
15.90 |
63 |
Month 6 |
MI |
88.39 |
15.72 |
55 |
Month 6 |
FIT |
86.37 |
15.07 |
59 |
Month 12 |
MI |
88.46 |
15.34 |
53 |
Month 12 |
FIT |
84.04 |
15.96 |
59 |
And the change from baseline and N at 6 and 12 months in each group:
df %>%
group_by(group) %>%
summarise(sd(kg1 - kg2, na.rm = T),
sd(kg1 - kg3, na.rm = T),
N_m6 = sum(!is.na(kg1)&!is.na(kg2)),
N_m12 = sum(!is.na(kg1)&!is.na(kg3))) %>%
# add total N
bind_rows(., summarise_at(., vars(starts_with("N")), sum)) %>%
pander(missing="")
MI |
2.057 |
2.319 |
55 |
53 |
FIT |
4.180 |
6.275 |
59 |
59 |
|
|
|
114 |
112 |
Missing data
Very little data in the study was lost to followup: only 9 participants failed to provide complete data, and only 7 failed to provide enough data to be included in our primary outcome models:
xs <- function(x) ifelse(x == 1, "x", ".")
df %>%
select(matches("^kg|^cm|^gqol")) %>%
mice::md.pattern() %>%
tidy() %>% rename(Count=.rownames) %>%
select(-V9) %>%
mutate_at(funs(xs), .vars = vars(-Count)) %>%
select(Count, noquote(order(colnames(.)))) %>%
filter(Count != "") %>%
pander(caption = "Patterns of data missing or lost-to-followup.")
Patterns of data missing or lost-to-followup.
112 |
x |
x |
x |
x |
x |
x |
x |
x |
2 |
x |
x |
. |
x |
x |
x |
x |
. |
7 |
x |
. |
. |
x |
. |
x |
. |
. |
Comparison of groups at baseline
We reported a table of baseline characteristics of participants in each group. In the text we futher reported that none of these baseline differences were statistically significant, based on the analyses here.
These functions run linear or logistic regression models, returning a test for the group indicator:
vars_to_compare <- c("cm1", "kg1", "bmi1", "gqol1", "age")
binary_vars_to_compare <- c("female")
# Helper functions for baseline comparisons, all return dataframes
comparegroups_cont <- function(f) lm(f, data = df) %>% broom::tidy()
comparegroups_binary <- function(f) {
glm(f, data = df, family = binomial(link = "logit")) %>%
broom::tidy() %>%
# Convert logit to OR. Remember glm returns z not t
mutate_at(vars(estimate, std.error), funs(exp))
}
# not actually used in the end
# comparegroups_categ <- function(f){
# # as an example, remember glm returns z stat not t
# chisq.test(xtabs(f, data=df)) %>%
# broom::tidy() %>%
# rename(df = parameter)
# }
We then run models for all continuous variables:
# compare all continuous variables
conts <- data_frame(model = paste(vars_to_compare, "~ group")) %>%
group_by(model) %>%
do(., comparegroups_cont(.$model))
And all binary variables:
binas <- data_frame(model = paste(binary_vars_to_compare, "~ group")) %>%
group_by(model) %>%
do(., comparegroups_binary(.$model))
And combine in a single table:
round.p <- function(p){
if (p<.001) p <- .001
return(sprintf("%.3f", p))
}
both <- bind_rows(conts, binas) %>%
filter(term == "groupFIT") %>%
select(-term) %>%
setNames(c("Model", "Difference/OR", "Std. Error", "Statistic", "p"))
apastats::round.p
## function (values, include.rel = 1, digits = 3, strip.lead.zeros = T,
## replace.very.small = 0.001)
## {
## values <- as.numeric(values)
## rel <- ifelse(include.rel, "= ", "")
## values_string <- format(round(values, digits = digits), nsmall = digits)
## if (strip.lead.zeros) {
## values_string <- sub("^0", "", values_string)
## }
## values_string <- paste(rel, values_string, sep = "")
## if (!is.null(replace.very.small)) {
## values_string[abs(values) <= replace.very.small] <- paste0("< ",
## ifelse(strip.lead.zeros, sub("^0", "", replace.very.small),
## replace.very.small))
## }
## values_string
## }
## <bytecode: 0x7f9ac98c83c0>
## <environment: namespace:apastats>
both %>%
mutate(p = round.p(p), p = str_replace(p, "=", "")) %>%
pander::pander(caption = "Baseline between-group comparisons (mean difference and t-statistic for continuous outcomes; odds ratio and z for binary variables; chisq for categorical variables).")
Baseline between-group comparisons (mean difference and t-statistic for continuous outcomes; odds ratio and z for binary variables; chisq for categorical variables).
age ~ group |
2.000 |
2.558 |
0.782 |
0.436 |
bmi1 ~ group |
0.408 |
0.988 |
0.413 |
0.680 |
cm1 ~ group |
0.567 |
2.397 |
0.236 |
0.813 |
gqol1 ~ group |
0.420 |
2.296 |
0.183 |
0.855 |
kg1 ~ group |
1.355 |
2.796 |
0.485 |
0.629 |
female ~ group |
0.944 |
1.511 |
-0.140 |
0.889 |
Outcome plots
We report a figure displaying raw outcome data:
primary_outcomes_raw <- df_long %>%
select(-contains("base"), -age, -female, -height, -bmi, -gqol) %>%
reshape2::melt(id.vars = c("group", "person", "time")) %>%
filter(!is.na(value)) %>%
ggplot(aes(time, value, group = group, shape = group)) +
stat_summary(geom = "pointrange", fun.data=mean_cl_boot, position=position_dodge(width=.1)) +
stat_summary(geom = "line", fun.data = mean_cl_boot) +
facet_wrap(~ toupper(variable), scales = "free") +
stat_summary(
aes(label = sprintf("%.0f", ..y..)),
fun.y = mean, geom = "text", size = 2.5, vjust = 1.5, hjust = 1.5
) +
xlab("Time") + ylab("kg/cm") +
scale_shape_discrete("") +
ggthemes::theme_tufte()
# ggsave("plots/primary_outcomes_raw.pdf", width = 5, height = 3)
primary_outcomes_raw
Primary outcome models
Our outcome models were defined as:
f.kg <- kg ~ kg_base + bmi_base * time * group + (1 | person)
f.cm <- cm ~ cm_base + bmi_base * time * group + (1 | person)
f.gqol <- gqol2 ~ gqol1 + bmi1 * group
Model fitting was either by ML/REML (using lm
or lmer
) or for Bayesian model fits via MCMC (using rstanarm
).
First with ML:
m.kg <- lmer(f.kg, data = df_long_nobase)
m.cm <- lmer(f.cm, data = df_long_nobase)
# note only have 6 month data for gqol so not lmer
m.gqol <- lm(f.gqol, data = df)
And then some code to extract contrasts we need:
gqol.contrasts <-
emmeans(m.gqol, "group", contr = "trt.vs.ctrlk")$contrasts %>%
tidy() %>%
mutate(time = "Month 6")
# helper to do the same for lmer models
# we see warnings when this runs because of the baseline interaction,
# but we have centered baseline. Contrasts are very similar to models
# without baseline interaction.
getlsmeansforgroup_ <- function(model) {
emmeans(model, pairwise ~ group | time, adjust = "none")$contrasts %>%
tidy()
}
# merge results
bgconts <- bind_rows(
getlsmeansforgroup_(m.kg) %>% mutate(outcome = "Kg"),
getlsmeansforgroup_(m.cm) %>% mutate(outcome = "Cm"),
gqol.contrasts %>% mutate(outcome = "GQOL")
) %>%
select(-level1, -level2) %>%
rename(t = statistic, difference = estimate) %>%
# flip estimates because of contrast coding direction
mutate(difference = -1 * difference, t = -1 * t) %>%
select(outcome, everything()) %>%
# fix p values for table output
mutate(p = round.p(p.value), p = str_replace(p, "=", "")) %>%
select(-p.value) %>%
# fix column names
setNames(c("Outcome", "Followup", "Difference", "Std. Error", "df", "t", "p"))
# These results are later merged with MCMC estimates and raw means so no
# output here - see table below.
Models estimated with MCMC, using rstanarm
package:
m.gqol.stan <- stan_lm(f.gqol, data = df, prior = R2(location = .5))
m.kg.stan <- stan_lmer(kg ~ kg_base + bmi_base * time * group + (1 | person),
data = df_long_nobase)
m.cm.stan <- stan_lmer(cm ~ cm_base + bmi_base * time * group + (1 | person),
data = df_long_nobase)
saveRDS(m.gqol.stan, "saved-chains/m.gqol.stan.RDS")
saveRDS(m.kg.stan, "saved-chains/m.kg.stan.RDS")
saveRDS(m.cm.stan, "saved-chains/m.cm.stan.RDS")
We used the default priors provided by the rstanarm
package, which are weakly informative and pessimistic (i.e. prior for effect size is zero mean). Details of the priors are shown below. Priors are constructed from the data based on rules described here:https://cran.r-project.org/web/packages/rstanarm/vignettes/priors.html#automatic-prior-scale-adjustments. For the GQOL model we chose to set the prior for \(R^2 = .5\), because we included baseline scores in this model which are likely to be highly correlated with followup (e.g. for weight). Setting \(R^2\) location lower would tend to reduce model coefficients, but for a reasonable range of values (we tested \(R^2 = \{.2, .5, .8\}\)) did not change the inferences reported.
m.gqol.stan$prior.info %>% pander()
prior:
- dist: R2
- location: 0.5
- what: mode
prior_intercept:
- dist: NA
- location: NA
- scale: NA
m.kg.stan$prior.info %>% pander()
prior:
- dist: normal
- location: 0, 0, 0, 0, 0, 0, 0 and 0
- scale: 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5 and 2.5
- adjusted_scale: 2.523, 7.153, 38.827, 38.827, 10.123, 10.308, 38.827 and 14.552
- df:
prior_intercept:
- dist: normal
- location: 0
- scale: 10
- adjusted_scale: 155.3
- df:
prior_covariance:
- dist: decov
- regularization: 1
- concentration: 1
- shape: 1
- scale: 1
prior_aux:
- dist: exponential
- location:
- scale:
- adjusted_scale: 15.53
- df:
- rate: 1
- aux_name: sigma
m.cm.stan$prior.info %>% pander()
prior:
- dist: normal
- location: 0, 0, 0, 0, 0, 0, 0 and 0
- scale: 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5 and 2.5
- adjusted_scale: 2.444, 5.954, 32.317, 32.317, 8.426, 8.579, 32.317 and 12.112
- df:
prior_intercept:
- dist: normal
- location: 0
- scale: 10
- adjusted_scale: 129.3
- df:
prior_covariance:
- dist: decov
- regularization: 1
- concentration: 1
- shape: 1
- scale: 1
prior_aux:
- dist: exponential
- location:
- scale:
- adjusted_scale: 12.93
- df:
- rate: 1
- aux_name: sigma
Model convergence
All models appeared to converge satisfactorily, based on visual inspection of MCMC traces and parameter R-hat statistics.
rstan::stan_trace(m.cm.stan)

rstan::stan_trace(m.kg.stan)

rstan::stan_trace(m.gqol.stan)

Rhat and Neff also seemed reasonable:
# Extract Rhat and other stats from each model and summarise
a <- summary(m.cm.stan) %>% as_tibble() %>% mutate(Outcome = "Cm")
b <- summary(m.kg.stan) %>% as_tibble() %>% mutate(Outcome = "Kg")
c <- summary(m.gqol.stan) %>% as_tibble() %>% mutate(Outcome = "GQOL")
model.stats <- bind_rows(a, b, c) %>%
mutate(mcse.scaled = mcse / sd) %>%
select(Rhat, mcse, mcse.scaled, Outcome) %>%
reshape2::melt(id.var = "Outcome") %>%
group_by(Outcome, variable)
model.stats %>%
do(., mean_cl_normal(.$value)) %>%
rename(Statistic = variable, Mean = y, `95% lower` = ymin, `95% upper` = ymax) %>%
pander(caption = "Model convergence statistics averaged across model parameters with 95% CI. Note mcse_scaled = mcse/sd.")
Model convergence statistics averaged across model parameters with 95% CI. Note mcse_scaled = mcse/sd.
Cm |
Rhat |
1.001 |
1.001 |
1.001 |
Cm |
mcse |
0.035 |
0.024 |
0.046 |
Cm |
mcse.scaled |
0.017 |
0.016 |
0.018 |
GQOL |
Rhat |
1.001 |
1.000 |
1.002 |
GQOL |
mcse |
0.049 |
-0.006 |
0.104 |
GQOL |
mcse.scaled |
0.023 |
0.019 |
0.027 |
Kg |
Rhat |
1.000 |
1.000 |
1.000 |
Kg |
mcse |
0.030 |
0.022 |
0.038 |
Kg |
mcse.scaled |
0.017 |
0.016 |
0.017 |
Average treatment effects
Average treatment effects are calculated from our ML models using the emmeans
package. For MCMC models, average treatment effects are computed based on posterior linear predictions (i.e. predictions for the mean using rstanarm::posterior_linpred
or the add_fitted_samples
wrapper in tidybayes
). Where we refer to predictions for individuals or for ‘new patients’ then these are based on full posterior predictions (i.e. rstanarm::posterior_predict
).
Begin by making dataframe defining values to predict-at; we predict at the means of covariate values:
# This is for the GQOL model:
newd <-
expand.grid(
cm_base = 0,
gqol1 = mean(df$gqol1, na.rm = T),
bmi1 = mean(df$bmi1, na.rm = T),
group = unique(df_long$group)
) %>%
as_tibble()
# And for cm and kg (all centered)
newd_long <-
expand.grid(
person = Inf,
cm_base = 0, kg_base = 0, bmi_base = 0,
time = factor(unique(df_long$time)[2:3]),
group = unique(df_long$group)
) %>%
as_tibble()
Make the predictions/intervals for means:
preds.k <- add_fitted_samples(m.kg.stan, newdata = newd_long) %>%
mutate(outcome = "Kg")
preds.c <- add_fitted_samples(m.cm.stan, newdata = newd_long) %>%
mutate(outcome = "Cm")
preds.g <- add_fitted_samples(m.gqol.stan, newdata = newd) %>%
mutate(outcome = "GQOL") %>%
# fill this in; would otherwise be NA and mess up our table
mutate(time = "Month 6")
Then make predictions/intervals for ‘new’ individuals:
preds.k.post <- add_predicted_samples(m.kg.stan, newdata = newd_long) %>%
mutate(outcome = "Kg")
preds.c.post <- add_predicted_samples(m.cm.stan, newdata = newd_long) %>%
mutate(outcome = "Cm")
And bind everything together:
stan.preds <- bind_rows(preds.g, preds.c, preds.k) %>%
ungroup() %>%
select(outcome, estimate, .iteration, .row, group, time) %>%
data.table::melt(id.vars=c("outcome", ".iteration", ".row", "group", "time")) %>%
data.table::dcast(outcome+time+.iteration~group)
stan.preds.post <- bind_rows(preds.c.post, preds.k.post) %>%
ungroup() %>%
select(outcome, pred, .iteration, .row, group, time) %>%
data.table::melt(id.vars=c("outcome", ".iteration", ".row", "group", "time")) %>%
data.table::dcast(outcome+time+.iteration~group)
We use these draws from the model posterior distribution to summarise the average treatment effects:
effect.summary <- stan.preds %>%
group_by(outcome, time) %>%
mean_qi(., FIT - MI) %>%
set_names(c("Outcome", "Followup", "Difference", "lower", "upper", "prob"))
# a table for the raw data in the same format to merge into a single table.
rawlong <-
df_long %>%
select(group, time, cm, kg, gqol) %>%
rename(Kg = kg, Cm = cm, GQOL = gqol) %>%
reshape2::melt(id.var = c("group", "time")) %>%
rename(Outcome = variable) %>%
group_by(group, time, Outcome) %>%
summarise(mu = mean(value, na.rm = T), sd = sd(value, na.rm = T)) %>%
melt(id.var = c("group", "time", "Outcome")) %>%
dcast(Outcome + time ~ group + variable) %>%
# Remove missing row Month 12 GQOL
filter(!is.na(MI_mu)) %>%
reshape2::melt(id.vars=c("Outcome", "time")) %>%
mutate(variable = stringr::str_replace(variable, "_mu", " (mean)")) %>%
mutate(variable = stringr::str_replace(variable, "_sd", " (sd)")) %>%
dcast(Outcome + time ~ variable) %>%
rename(Followup = time)
We then combine predictions from the Bayesian model with ML estimates from above to create the combined table presented in the paper:
# the `bgconts` dataframef containing ML contrasts was created in the
# code futher up, from lmer model fits
between.groups.table <-
left_join(bgconts, effect.summary,
by = c("Outcome", "Followup"),
suffix = c(" (ML)", " (MCMC)")
) %>%
select(-`prob`) %>%
left_join(rawlong, .) %>%
select(-`Difference (ML)`, -`Std. Error`) %>%
rename(`Treatment effect (MCMC)` = `Difference (MCMC)`)
between.groups.table %>%
pander(missing = "", caption = "Between group contrasts (with Satterthwaite corrected degrees of freedom for Kg and Cm) and posterior mean differences (and 95% credible intervals) for the effect of FIT vs. MI at month 6 and 12.")
Between group contrasts (with Satterthwaite corrected degrees of freedom for Kg and Cm) and posterior mean differences (and 95% credible intervals) for the effect of FIT vs. MI at month 6 and 12.
Cm |
Baseline |
106.07 |
13.75 |
105.50 |
12.51 |
|
|
|
|
|
|
Cm |
Month 6 |
99.05 |
12.61 |
102.78 |
13.37 |
205.2 |
-4.727 |
0.001 |
-4.455 |
-6.354 |
-2.614 |
Cm |
Month 12 |
96.97 |
12.59 |
103.04 |
12.45 |
205.9 |
-7.011 |
0.001 |
-6.685 |
-8.586 |
-4.839 |
Kg |
Baseline |
90.48 |
15.90 |
89.13 |
14.76 |
|
|
|
|
|
|
Kg |
Month 6 |
86.37 |
15.07 |
88.39 |
15.72 |
161.0 |
-4.877 |
0.001 |
-3.674 |
-5.175 |
-2.183 |
Kg |
Month 12 |
84.04 |
15.96 |
88.46 |
15.34 |
163.0 |
-7.706 |
0.001 |
-5.931 |
-7.454 |
-4.386 |
GQOL |
Baseline |
62.13 |
10.59 |
61.71 |
14.51 |
|
|
|
|
|
|
GQOL |
Month 6 |
75.81 |
11.66 |
72.53 |
10.42 |
109.0 |
2.107 |
0.001 |
2.837 |
0.061 |
5.655 |
Predictions for individuals: Recasting results from the patient perspective
By simulating data from our models it was possible to estimate the probability that the benefit of FIT over MI for kg lost or WC reduced exceeded a particular value. The code below estimates and plots probabilities for a range of different outcomes: between 0 and 15kg of weight lost or CM reduction in waist.
# given a dataframe of predictions, calculate the probability the effect exceeds `val`
p.for.rope <- function(outcome, df, val) {
df %>%
filter(outcome == outcome & time == "Month 12") %>%
group_by(outcome) %>%
summarise(p = mean((MI - FIT) > val), val = val)
}
area_of_interest <- seq(-2, 15, .25)
ropes.post <- expand.grid(val = area_of_interest, outcome = c("Kg", "Cm")) %>%
rowwise() %>%
do(., p.for.rope(outcome, stan.preds.post, .$val))
p <- ropes.post %>%
# trim extreme values because otherwise loess smoothing
# goes crazy but note that the smoothed line does now match
# the data (uncomment the geom_point(). layer below to check this)
filter(p < .98 & p > .02) %>%
# it's confusing to show weight-gain within this plot, so filter val > 0,
filter(val > 0) %>%
# reverse the outcome
mutate(Outcome = outcome) %>%
# plot
ggplot(aes(x = val, y = p, group = Outcome, linetype = Outcome)) +
geom_smooth(se = F, color = "black", method="loess") +
# geom_point() +
scale_y_continuous(breaks = seq(0, 1, .1), minor_breaks = seq(.05, 1, .05)) +
scale_x_continuous(breaks = seq(0, 15, 2)) +
xlab("Benefit to new patient choosing FIT vs. MI.") +
ylab("Probability") +
scale_color_discrete("")
ggsave("plots/probability_for_ropes.pdf", p, width = 5, height = 3)
p
This table was not included in the paper for brevity, but these figures were used within the text:
ropes.post %>%
ungroup() %>%
group_by(outcome, p.round = round(p, 1)) %>%
filter(p.round %in% seq(.1, .9, .1)) %>%
summarise(effect = mean(val)) %>%
rename(
Outcome = outcome,
Probability = p.round,
Effect = effect
) %>%
melt(id.var = c("Outcome", "Probability")) %>%
dcast(Probability ~ Outcome + variable) %>%
rename(`Cm reduction` = Cm_Effect, `Kg lost` = Kg_Effect) %>%
pander(caption = "Supplementary table giving calculations for verbal descriptives of the effect of FIT vs. MI given in the text. Figures indicate probabilities for a benefit at least as good as the value in the kg or cg columns.")
Supplementary table giving calculations for verbal descriptives of the effect of FIT vs. MI given in the text. Figures indicate probabilities for a benefit at least as good as the value in the kg or cg columns.
0.1 |
14.250 |
11.125 |
0.2 |
12.250 |
9.125 |
0.4 |
8.250 |
6.875 |
0.5 |
6.625 |
6.000 |
0.6 |
5.000 |
5.125 |
0.8 |
1.250 |
2.750 |
0.9 |
-1.000 |
1.000 |
Presenting predicted outcomes in natural frequencies
In the paper we report patient outcomes in nautural frequencies and include waffle plots.
# define thresholds for the pictogram for kg based on % weight lost from baseline:
kg.base <- df$kg1 %>% mean(., na.rm = T)
wtcuts_pc <- c(
-Inf,
-kg.base * .05, -kg.base * .01, kg.base * .01, kg.base * .05, kg.base * .1, kg.base * .15, Inf
)
wtcutlabls_pc <- c("Gain > 5%", "Gain 1-5%", "< 1% change", "Lose 1-5%", "Lose 5-10%", "Lose 10-15%", "Lose > 15%")
# sometimes percentages don't sum to 100 because of rounding, so fix using this
make100 <- function(d) {
shouldbe100 <- sum(d)
if (shouldbe100 != 100) {
# add or remove rounding error from the modal category
ismode <- d == max(d)
stopifnot(sum(ismode) < 2) # only add to one category... will now fail with ties?
d <- d + (100 - shouldbe100) * ismode
}
return(d)
}
bygroup.wd <- stan.preds.post %>%
filter(outcome == "Kg") %>%
rename(i = outcome) %>%
select(i, MI, FIT, time) %>%
reshape2::melt(id.vars = c("i", "time")) %>%
mutate(v = kg.base - value, value = cut(kg.base - value,
breaks = wtcuts_pc,
labels = wtcutlabls_pc
)) %>%
group_by(time, variable) %>%
mutate(N = n()) %>%
group_by(variable, time, value) %>%
summarise(cat = round(n() / N[1] * 100))
# make two dataframes, one for each group
mi.wd <- bygroup.wd %>%
filter(variable == "MI" & time == "Month 12")
fit.wd <- bygroup.wd %>%
filter(variable == "FIT" & time == "Month 12")
# make data lists
mi.waff <- mi.wd$cat
fit.waff <- fit.wd$cat
# make labels and colors
names(mi.waff) <- mi.wd$value
names(fit.waff) <- fit.wd$value
pal <- c("#e34a33", "#fdbb84", "grey", "#b1c9b4", "#41ab5d", "#238443", "#093810")
And finally draw the plots:
plot.a <- waffle::waffle(make100(mi.waff),
colors = pal, flip = T, rows = 5, legend_pos = "left")
ggsave("plots/waffle_mi_percent_kg.pdf", width = 3, height = 4)
plot.a
plot.b <- waffle::waffle(make100(fit.waff),
colors = pal, flip = T, rows = 5, legend_pos = "left")
ggsave("plots/waffle_fit_percent_kg.pdf", width = 3, height = 4)
plot.b
This table not shown in the paper, but some numbers were reported in the text:
bygroup.wd %>%
reshape2::dcast(value ~ time + variable, value.var = "cat") %>%
rename(Outcome = value) %>%
pander(caption = "% patients achieving different outcomes at treatment end and 12 months in MI vs FIT groups.")
% patients achieving different outcomes at treatment end and 12 months in MI vs FIT groups.
Gain > 5% |
9 |
1 |
8 |
0 |
Gain 1-5% |
22 |
6 |
22 |
2 |
< 1% change |
16 |
9 |
16 |
4 |
Lose 1-5% |
31 |
30 |
32 |
19 |
Lose 5-10% |
19 |
37 |
19 |
40 |
Lose 10-15% |
3 |
15 |
3 |
28 |
Lose > 15% |
0 |
2 |
0 |
7 |
An additional table showing probabilities that MI is better than fit, from the perspective of new patients.
stan.preds.post %>%
group_by(time, outcome) %>%
summarise(`Probability FIT preferable to MI` = mean(MI > FIT)) %>%
arrange(-as.numeric(time), outcome) %>%
rename(Outcome = outcome, Followup = time) %>%
pander("Probabilities that MI is better than fit, from the perspective of new patients, at each followup.")
Probabilities that MI is better than fit, from the perspective of new patients, at each followup.
Month 12 |
Cm |
0.852 |
Month 12 |
Kg |
0.944 |
Month 6 |
Cm |
0.742 |
Month 6 |
Kg |
0.838 |