“According to your institutional policies, how important are the following research activities in making promotion decisions in relation to your current career stage? [Publishing a large number of research articles]”
etc.
step1 <- df %>%
select(X23:X41) %>%
pivot_longer(everything(), names_to = "var", values_to = "val") %>%
count(var, val)
## select: dropped 54 variables (X1, X3, X7, X8, X9, …)
## pivot_longer: reorganized (X23, X24, X25, X26, X27, …) into (var, val) [was 198x19, now 3762x2]
## count: now 132 rows and 3 columns, ungrouped
# remove dont knows and not applicables
nas <- step1 %>%
filter(val %in% c("I don't know", "Not applicable"))
## filter: removed 94 rows (71%), 38 rows remaining
pdata <- step1 %>%
anti_join(nas) %>%
mutate(val = fct_relevel(val, "Very important", "Somewhat important",
"Neither important nor unimportant",
"Somewhat unimportant", "Very unimportant")) %>%
make_proportion(group = var, var = val, order_string = "\\simportant$")
## Joining, by = c("var", "val", "n")
## anti_join: added no columns
## > rows only in x 94
## > rows only in y ( 0)
## > matched rows (38)
## > ====
## > rows total 94
## mutate: converted 'val' from character to factor (0 new NA)
## group_by: one grouping variable (var)
## mutate (grouped): new variable 'prop' (double) with 83 unique values and 0% NA
## new variable 'order' (double) with 19 unique values and 0% NA
# get labels
labels <- var_overview %>%
filter(var_id %in% paste0("X", 23:41)) %>%
mutate(label = str_extract(var_full, "(?<=\\[).*?(?=\\s?\\])"),
# clean up labels
label = str_remove(label, "\\(.*"),
label = str_remove(label, ", as assessed .*")) %>%
select(var_id, label)
## filter: removed 73 rows (79%), 19 rows remaining
## mutate: new variable 'label' (character) with 19 unique values and 0% NA
## select: dropped one variable (var_full)
pdata_institution <- pdata %>%
left_join(labels, by = c("var" = "var_id"))
## left_join: added one column (label)
## > rows only in x 0
## > rows only in y ( 0)
## > matched rows 94
## > ====
## > rows total 94
pdata_institution %>%
ungroup() %>%
arrange(desc(order), val) %>%
mutate(prop = scales::percent(prop, accuracy = .1),
summary = glue::glue("{n} ({prop})")) %>%
select(variable = label, value = val, summary) %>%
pivot_wider(names_from = value, values_from = summary,
values_fill = "0 (0.0%)", names_sort = TRUE) %>%
knitr::kable()
## ungroup: no grouping variables
## mutate: converted 'prop' from double to character (0 new NA)
## new variable 'summary' (character) with 80 unique values and 0% NA
## select: renamed 2 variables (variable, value) and dropped 4 variables
## pivot_wider: reorganized (value, summary) into (Very important, Somewhat important, Neither important nor unimportant, Somewhat unimportant, Very unimportant) [was 94x3, now 19x6]
| variable | Very important | Somewhat important | Neither important nor unimportant | Somewhat unimportant | Very unimportant |
|---|---|---|---|---|---|
| Generating funding | 143 (75.3%) | 34 (17.9%) | 9 (4.7%) | 0 (0.0%) | 4 (2.1%) |
| Leading projects | 100 (52.4%) | 67 (35.1%) | 17 (8.9%) | 3 (1.6%) | 4 (2.1%) |
| Generating high-quality publications | 106 (56.1%) | 59 (31.2%) | 11 (5.8%) | 7 (3.7%) | 6 (3.2%) |
| Publishing in highly regarded journals or conferences | 107 (56.3%) | 54 (28.4%) | 16 (8.4%) | 9 (4.7%) | 4 (2.1%) |
| Publishing a large number of research articles | 78 (40.8%) | 81 (42.4%) | 17 (8.9%) | 13 (6.8%) | 2 (1.0%) |
| Mentoring PhDs and postdocs | 87 (46.0%) | 70 (37.0%) | 21 (11.1%) | 4 (2.1%) | 7 (3.7%) |
| Giving invited talks and keynotes | 48 (25.3%) | 97 (51.1%) | 27 (14.2%) | 9 (4.7%) | 9 (4.7%) |
| Receiving awards | 59 (31.7%) | 78 (41.9%) | 36 (19.4%) | 10 (5.4%) | 3 (1.6%) |
| Generating a large number of citations | 58 (30.9%) | 76 (40.4%) | 28 (14.9%) | 18 (9.6%) | 8 (4.3%) |
| Networking activities | 35 (18.4%) | 84 (44.2%) | 43 (22.6%) | 18 (9.5%) | 10 (5.3%) |
| Developing industry collaborations | 37 (20.3%) | 75 (41.2%) | 38 (20.9%) | 16 (8.8%) | 16 (8.8%) |
| Engaging with the public | 30 (16.0%) | 80 (42.6%) | 42 (22.3%) | 18 (9.6%) | 18 (9.6%) |
| Being collegial, helpful and respectful | 44 (23.7%) | 61 (32.8%) | 33 (17.7%) | 20 (10.8%) | 28 (15.1%) |
| Creating intellectual property | 24 (14.1%) | 65 (38.2%) | 41 (24.1%) | 20 (11.8%) | 20 (11.8%) |
| Engaging policy makers | 23 (12.9%) | 60 (33.7%) | 55 (30.9%) | 23 (12.9%) | 17 (9.6%) |
| Contributing to peer review | 24 (12.8%) | 55 (29.3%) | 51 (27.1%) | 37 (19.7%) | 21 (11.2%) |
| Openly sharing research articles | 27 (14.4%) | 37 (19.8%) | 65 (34.8%) | 25 (13.4%) | 33 (17.6%) |
| Openly sharing research code or creating research software | 13 (7.8%) | 36 (21.7%) | 53 (31.9%) | 27 (16.3%) | 37 (22.3%) |
| Openly sharing research data | 17 (9.5%) | 35 (19.6%) | 65 (36.3%) | 26 (14.5%) | 36 (20.1%) |
# colours came from `RColorBrewer::brewer.pal("PiYG", n = 5)`, but with yellow
# as the midpoint
# new colours:
purple <- c('#F9AEEF', '#B96FB0') # light, dark
green <- c('#94D790', "#54984E") # light, dark
blue <- c('#86CFFB', '#3792BD') # light, dark
midpoint <- "#E0C47C"
p1 <- pdata_institution %>%
ggplot(aes(fct_reorder(label, order), prop, fill = val)) +
geom_chicklet(width = .7) +
coord_flip() +
scale_y_continuous(labels = scales::percent) +
scale_fill_manual(values = c("#B96FB0", "#F9AEEF", "#E0C47C", "#94D790",
"#54984E")) +
theme(legend.position = "top") +
guides(fill = guide_legend(nrow = 2, byrow = TRUE)) +
labs(x = NULL, y = NULL, fill = NULL)
# get missings
p_nas <- nas %>%
left_join(select(pdata_institution, var, order)) %>%
left_join(labels, by = c("var" = "var_id")) %>%
distinct() %>%
group_by(label, order) %>%
summarise(n = sum(n))
## select: dropped 4 variables (val, n, prop, label)
## Joining, by = "var"
## left_join: added one column (order)
## > rows only in x 0
## > rows only in y ( 0)
## > matched rows 188 (includes duplicates)
## > =====
## > rows total 188
## left_join: added one column (label)
## > rows only in x 0
## > rows only in y ( 0)
## > matched rows 188
## > =====
## > rows total 188
## distinct: removed 150 rows (80%), 38 rows remaining
## group_by: 2 grouping variables (label, order)
## summarise: now 19 rows and 3 columns, one group variable remaining (label)
p2 <- p_nas %>%
ggplot(aes(y = fct_reorder(label, order), x = n)) +
geom_col(fill = "grey50", width = .7) +
labs(x = NULL, y = "# of answers of 'Not applicable' or 'I don't know'") +
scale_x_continuous(breaks = c(0, 32)) +
scale_y_discrete(position = "right") +
hrbrtheme_fixed(grid = "", plot_margin = margin(l = 5)) +
theme(panel.border = element_rect(fill = NA, colour = "grey80"),
axis.text.y = element_blank(),
axis.title.y = element_text( angle = 270))
p1 +
theme(plot.margin = margin()) + p2 +
plot_layout(widths = c(6, 1)) +
plot_annotation(caption = "n = 198")
“In your own personal opinion, how important should the following research activities be when making promotion decisions? [Publishing a large number of research articles]”
etc.
step1 <- df %>%
select(X47:X65) %>%
pivot_longer(everything(), names_to = "var", values_to = "val") %>%
count(var, val)
## select: dropped 54 variables (X1, X3, X7, X8, X9, …)
## pivot_longer: reorganized (X47, X48, X49, X50, X51, …) into (var, val) [was 198x19, now 3762x2]
## count: now 108 rows and 3 columns, ungrouped
# remove dont knows and not applicables
nas <- step1 %>%
filter(val %in% c("I don't know", "Not applicable"))
## filter: removed 95 rows (88%), 13 rows remaining
pdata <- step1 %>%
anti_join(nas) %>%
mutate(val = fct_relevel(val, "Very important", "Somewhat important",
"Neither important nor unimportant",
"Somewhat unimportant", "Very unimportant")) %>%
make_proportion(group = var, var = val, order_string = "\\simportant$")
## Joining, by = c("var", "val", "n")
## anti_join: added no columns
## > rows only in x 95
## > rows only in y ( 0)
## > matched rows (13)
## > ====
## > rows total 95
## mutate: converted 'val' from character to factor (0 new NA)
## group_by: one grouping variable (var)
## mutate (grouped): new variable 'prop' (double) with 82 unique values and 0% NA
## new variable 'order' (double) with 18 unique values and 0% NA
# get labels
labels <- var_overview %>%
filter(var_id %in% paste0("X", 47:65)) %>%
mutate(label = str_extract(var_full, "(?<=\\[).*?(?=\\s?\\])"),
# clean up labels
label = str_remove(label, "\\(.*"),
label = str_remove(label, ", as assessed .*")) %>%
select(var_id, label)
## filter: removed 73 rows (79%), 19 rows remaining
## mutate: new variable 'label' (character) with 19 unique values and 0% NA
## select: dropped one variable (var_full)
pdata_individual <- pdata %>%
left_join(labels, by = c("var" = "var_id"))
## left_join: added one column (label)
## > rows only in x 0
## > rows only in y ( 0)
## > matched rows 95
## > ====
## > rows total 95
pdata_individual %>%
ungroup() %>%
arrange(desc(order), val) %>%
mutate(prop = scales::percent(prop, accuracy = .1),
summary = glue::glue("{n} ({prop})")) %>%
select(variable = label, value = val, summary) %>%
pivot_wider(names_from = value, values_from = summary,
values_fill = "0 (0.0%)", names_sort = TRUE) %>%
knitr::kable()
## ungroup: no grouping variables
## mutate: converted 'prop' from double to character (0 new NA)
## new variable 'summary' (character) with 75 unique values and 0% NA
## select: renamed 2 variables (variable, value) and dropped 4 variables
## pivot_wider: reorganized (value, summary) into (Very important, Somewhat important, Neither important nor unimportant, Somewhat unimportant, Very unimportant) [was 95x3, now 19x6]
| variable | Very important | Somewhat important | Neither important nor unimportant | Somewhat unimportant | Very unimportant |
|---|---|---|---|---|---|
| Generating high-quality publications | 140 (70.7%) | 51 (25.8%) | 4 (2.0%) | 2 (1.0%) | 1 (0.5%) |
| Mentoring PhDs and postdocs | 122 (62.2%) | 64 (32.7%) | 5 (2.6%) | 3 (1.5%) | 2 (1.0%) |
| Leading projects | 94 (47.5%) | 83 (41.9%) | 14 (7.1%) | 5 (2.5%) | 2 (1.0%) |
| Being collegial, helpful and respectful | 117 (59.4%) | 58 (29.4%) | 13 (6.6%) | 6 (3.0%) | 3 (1.5%) |
| Giving invited talks and keynotes | 55 (27.8%) | 107 (54.0%) | 28 (14.1%) | 5 (2.5%) | 3 (1.5%) |
| Contributing to peer review | 75 (37.9%) | 87 (43.9%) | 29 (14.6%) | 5 (2.5%) | 2 (1.0%) |
| Generating funding | 73 (36.9%) | 88 (44.4%) | 23 (11.6%) | 8 (4.0%) | 6 (3.0%) |
| Publishing in highly regarded journals or conferences | 71 (35.9%) | 82 (41.4%) | 24 (12.1%) | 12 (6.1%) | 9 (4.5%) |
| Networking activities | 59 (29.8%) | 90 (45.5%) | 34 (17.2%) | 11 (5.6%) | 4 (2.0%) |
| Generating a large number of citations | 44 (22.3%) | 104 (52.8%) | 35 (17.8%) | 10 (5.1%) | 4 (2.0%) |
| Engaging with the public | 67 (34.0%) | 76 (38.6%) | 35 (17.8%) | 11 (5.6%) | 8 (4.1%) |
| Openly sharing research data | 67 (34.9%) | 72 (37.5%) | 36 (18.8%) | 7 (3.6%) | 10 (5.2%) |
| Publishing a large number of research articles | 35 (17.7%) | 99 (50.0%) | 40 (20.2%) | 21 (10.6%) | 3 (1.5%) |
| Openly sharing research articles | 69 (34.8%) | 63 (31.8%) | 41 (20.7%) | 12 (6.1%) | 13 (6.6%) |
| Openly sharing research code or creating research software | 54 (31.0%) | 62 (35.6%) | 41 (23.6%) | 8 (4.6%) | 9 (5.2%) |
| Engaging policy makers | 40 (20.8%) | 86 (44.8%) | 45 (23.4%) | 14 (7.3%) | 7 (3.6%) |
| Receiving awards | 40 (20.2%) | 79 (39.9%) | 52 (26.3%) | 13 (6.6%) | 14 (7.1%) |
| Developing industry collaborations | 39 (20.1%) | 74 (38.1%) | 45 (23.2%) | 17 (8.8%) | 19 (9.8%) |
| Creating intellectual property | 26 (14.4%) | 60 (33.1%) | 51 (28.2%) | 26 (14.4%) | 18 (9.9%) |
p1 <- pdata_individual %>%
ggplot(aes(fct_reorder(label, order), prop, fill = val)) +
geom_chicklet(width = .7) +
coord_flip() +
scale_y_continuous(labels = scales::percent) +
scale_fill_manual(values = c("#B96FB0", "#F9AEEF", "#E0C47C", "#94D790",
"#54984E")) +
theme(legend.position = "top") +
guides(fill = guide_legend(nrow = 2, byrow = TRUE)) +
labs(x = NULL, y = NULL, fill = NULL)
# get missings
p_nas <- nas %>%
full_join(distinct(pdata_individual, var, order)) %>%
replace_na(list(n = 0)) %>%
left_join(labels, by = c("var" = "var_id")) %>%
group_by(label, order) %>%
summarise(n = sum(n))
## distinct (grouped): removed 76 rows (80%), 19 rows remaining
## Joining, by = "var"
## full_join: added one column (order)
## > rows only in x 0
## > rows only in y 10
## > matched rows 13
## > ====
## > rows total 23
## replace_na: changed 10 values (43%) of 'n' (10 fewer NA)
## left_join: added one column (label)
## > rows only in x 0
## > rows only in y ( 0)
## > matched rows 23
## > ====
## > rows total 23
## group_by: 2 grouping variables (label, order)
## summarise: now 19 rows and 3 columns, one group variable remaining (label)
p2 <- p_nas %>%
ggplot(aes(y = fct_reorder(label, order), x = n)) +
geom_col(fill = "grey50", width = .7) +
labs(x = NULL, y = "# of answers of 'Not applicable' or 'I don't know'") +
scale_x_continuous(breaks = c(0, 24)) +
scale_y_discrete(position = "right") +
hrbrtheme_fixed(grid = "", plot_margin = margin(l = 5)) +
theme(panel.border = element_rect(fill = NA, colour = "grey80"),
axis.text.y = element_blank(),
axis.title.y = element_text( angle = 270))
p1 +
theme(plot.margin = margin()) + p2 +
plot_layout(widths = c(6, 1)) +
plot_annotation(caption = "n = 198")
institutional_values <- get_values(df, X23:X41, source = "institutional",
var_overview)
## select: dropped 53 variables (X3, X7, X8, X9, X10, …)
## pivot_longer: reorganized (X23, X24, X25, X26, X27, …) into (var, val) [was 198x20, now 3762x3]
## mutate: new variable 'num_val' (integer) with 6 unique values and 6% NA
## filter: removed 244 rows (6%), 3,518 rows remaining
## mutate: new variable 'source' (character) with one unique value and 0% NA
## filter: removed 73 rows (79%), 19 rows remaining
## mutate: new variable 'label' (character) with 19 unique values and 0% NA
## select: dropped one variable (var_full)
## left_join: added one column (label)
## > rows only in x 0
## > rows only in y ( 0)
## > matched rows 3,518
## > =======
## > rows total 3,518
## select: dropped one variable (var)
personal_values <- get_values(df, X47:X65, source = "personal",
var_overview)
## select: dropped 53 variables (X3, X7, X8, X9, X10, …)
## pivot_longer: reorganized (X47, X48, X49, X50, X51, …) into (var, val) [was 198x20, now 3762x3]
## mutate: new variable 'num_val' (integer) with 6 unique values and 2% NA
## filter: removed 62 rows (2%), 3,700 rows remaining
## mutate: new variable 'source' (character) with one unique value and 0% NA
## filter: removed 73 rows (79%), 19 rows remaining
## mutate: new variable 'label' (character) with 19 unique values and 0% NA
## select: dropped one variable (var_full)
## left_join: added one column (label)
## > rows only in x 0
## > rows only in y ( 0)
## > matched rows 3,700
## > =======
## > rows total 3,700
## select: dropped one variable (var)
institutional_scores <- bootstrap_values(institutional_values)
## group_by: 2 grouping variables (label, source)
## summarise: now 19 rows and 3 columns, one group variable remaining (label)
personal_scores <- bootstrap_values(personal_values)
## group_by: 2 grouping variables (label, source)
## summarise: now 19 rows and 3 columns, one group variable remaining (label)
all_scores <- bind_rows(institutional_scores, personal_scores)
all_scores
## # A tibble: 38 × 5
## # Groups: label [19]
## label source Mean Lower Upper
## <chr> <chr> <dbl> <dbl> <dbl>
## 1 "Being collegial, helpful and respectful" institutional 2.61 2.41 2.79
## 2 "Contributing to peer review" institutional 2.87 2.70 3.04
## 3 "Creating intellectual property " institutional 2.69 2.51 2.88
## 4 "Developing industry collaborations" institutional 2.45 2.28 2.60
## 5 "Engaging policy makers" institutional 2.72 2.56 2.88
## 6 "Engaging with the public " institutional 2.54 2.38 2.71
## 7 "Generating a large number of citations" institutional 2.16 2 2.32
## 8 "Generating funding" institutional 1.36 1.24 1.47
## 9 "Generating high-quality publications" institutional 1.67 1.53 1.81
## 10 "Giving invited talks and keynotes" institutional 2.13 1.99 2.27
## # … with 28 more rows
# compute differences
differences <- all_scores %>%
select(label, Mean, source) %>%
pivot_wider(names_from = source, values_from = Mean) %>%
mutate(diff = personal - institutional)
## select: dropped 2 variables (Lower, Upper)
## pivot_wider: reorganized (Mean, source) into (institutional, personal) [was 38x3, now 19x3]
## mutate (grouped): new variable 'diff' (double) with 19 unique values and 0% NA
differences %>%
rename(Question = label) %>%
arrange(diff) %>%
mutate(across(where(is.numeric), ~round(.x, digits = 2))) %>%
knitr::kable()
## rename: renamed one variable (Question)
## mutate (grouped): changed 18 values (95%) of 'institutional' (0 new NA)
## changed 18 values (95%) of 'personal' (0 new NA)
## changed 19 values (100%) of 'diff' (0 new NA)
| Question | institutional | personal | diff |
|---|---|---|---|
| Openly sharing research data | 3.16 | 2.07 | -1.09 |
| Openly sharing research code or creating research software | 3.23 | 2.17 | -1.06 |
| Being collegial, helpful and respectful | 2.61 | 1.58 | -1.03 |
| Contributing to peer review | 2.87 | 1.85 | -1.02 |
| Openly sharing research articles | 3.00 | 2.18 | -0.82 |
| Engaging with the public | 2.54 | 2.07 | -0.47 |
| Engaging policy makers | 2.72 | 2.28 | -0.44 |
| Networking activities | 2.39 | 2.05 | -0.34 |
| Mentoring PhDs and postdocs | 1.80 | 1.46 | -0.34 |
| Generating high-quality publications | 1.67 | 1.35 | -0.32 |
| Giving invited talks and keynotes | 2.13 | 1.96 | -0.17 |
| Generating a large number of citations | 2.16 | 2.12 | -0.04 |
| Leading projects | 1.66 | 1.68 | 0.02 |
| Creating intellectual property | 2.69 | 2.72 | 0.04 |
| Developing industry collaborations | 2.45 | 2.50 | 0.05 |
| Publishing in highly regarded journals or conferences | 1.68 | 2.02 | 0.34 |
| Receiving awards | 2.03 | 2.40 | 0.37 |
| Publishing a large number of research articles | 1.85 | 2.28 | 0.43 |
| Generating funding | 1.36 | 1.92 | 0.56 |
final_scores <- all_scores %>%
left_join(differences) %>%
select(-institutional, -personal)
## Joining, by = "label"
## left_join: added 3 columns (institutional, personal, diff)
## > rows only in x 0
## > rows only in y ( 0)
## > matched rows 38
## > ====
## > rows total 38
## select: dropped 2 variables (institutional, personal)
final_scores
## # A tibble: 38 × 6
## # Groups: label [19]
## label source Mean Lower Upper diff
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 "Being collegial, helpful and respectful" institut… 2.61 2.41 2.79 -1.03
## 2 "Contributing to peer review" institut… 2.87 2.70 3.04 -1.02
## 3 "Creating intellectual property " institut… 2.69 2.51 2.88 0.0355
## 4 "Developing industry collaborations" institut… 2.45 2.28 2.60 0.0549
## 5 "Engaging policy makers" institut… 2.72 2.56 2.88 -0.443
## 6 "Engaging with the public " institut… 2.54 2.38 2.71 -0.471
## 7 "Generating a large number of citations" institut… 2.16 2 2.32 -0.0428
## 8 "Generating funding" institut… 1.36 1.24 1.47 0.561
## 9 "Generating high-quality publications" institut… 1.67 1.53 1.81 -0.318
## 10 "Giving invited talks and keynotes" institut… 2.13 1.99 2.27 -0.167
## # … with 28 more rows
dodge_width <- .2
custom_blue <- "#3792BD"
yellow <- "#E0C47C"
# "#F9AEEF", "#E0C47C", "#94D790",
bigger_text_size <- 15
pdata <- final_scores %>%
mutate(source = recode(source, personal = "Personal view",
institutional = "Perceived institutional view"))
## mutate (grouped): changed 38 values (100%) of 'source' (0 new NA)
comparison <- pdata %>%
ggplot(aes(Mean, fct_rev(fct_reorder(str_wrap(label, 40), diff)))) +
geom_linerange(aes(xmin = Lower, xmax = Upper, group = fct_rev(source)),
position = position_dodge(width = dodge_width),
colour = "grey60") +
geom_point(aes(colour = fct_rev(source)), size = 2.6,
position = position_dodge(width = dodge_width)) +
five_point_scale() +
scale_colour_manual(values = c(
`Personal view` = yellow, #"#B96FB0",
`Perceived institutional view` = custom_blue #"#54984E"
)) +
labs(y = NULL, colour = NULL, x = NULL) +
theme(legend.position = "top", plot.margin = margin(),
legend.text = element_text(size = bigger_text_size),
axis.text.y = element_text(size = bigger_text_size),
axis.text.x = element_text(size = bigger_text_size))
comparison
arrows <- png::readPNG(here::here("analysis-notebooks/comparison-addition.PNG"),
native = TRUE)
comparison + wrap_elements(panel = arrows) +
plot_layout(widths = c(5, 1))
Wondering: are correlations within individuals similar to above picture, or is this another angle?
cor_data <- bind_rows(personal_values, institutional_values) %>%
select(-val) %>%
pivot_wider(names_from = "source", values_from = "num_val") %>%
group_by(label)
## select: dropped one variable (val)
## pivot_wider: reorganized (num_val, source) into (personal, institutional) [was 7218x4, now 3720x4]
## group_by: one grouping variable (label)
cor_data %>%
summarise(correlation = cor(personal, institutional, use = "pairwise",
method = "spearman")) %>%
arrange(correlation) %>%
mutate(correlation = round(correlation, 3))
## summarise: now 19 rows and 2 columns, ungrouped
## mutate: changed 19 values (100%) of 'correlation' (0 new NA)
## # A tibble: 19 × 2
## label correlation
## <chr> <dbl>
## 1 "Generating funding" 0.175
## 2 "Publishing in highly regarded journals or conferences " 0.258
## 3 "Generating high-quality publications" 0.307
## 4 "Leading projects" 0.313
## 5 "Giving invited talks and keynotes" 0.319
## 6 "Contributing to peer review" 0.332
## 7 "Being collegial, helpful and respectful" 0.336
## 8 "Engaging with the public " 0.369
## 9 "Publishing a large number of research articles" 0.396
## 10 "Receiving awards" 0.398
## 11 "Mentoring PhDs and postdocs" 0.407
## 12 "Networking activities " 0.409
## 13 "Generating a large number of citations" 0.422
## 14 "Openly sharing research code or creating research software" 0.423
## 15 "Openly sharing research data" 0.429
## 16 "Openly sharing research articles " 0.438
## 17 "Developing industry collaborations" 0.459
## 18 "Engaging policy makers" 0.459
## 19 "Creating intellectual property " 0.478
conf_ints <- cor_data %>%
summarise(res = list(cor.test(personal, institutional, conf.level = .995))) %>%
mutate(cor = map_dbl(res, pluck, "estimate"),
conf_int = map(res, pluck, "conf.int"),
lower = map_dbl(conf_int, pluck, 1),
upper = map_dbl(conf_int, pluck, 2)) %>%
select(label, cor, lower, upper) %>%
arrange(cor)
## summarise: now 19 rows and 2 columns, ungrouped
## mutate: new variable 'cor' (double) with 19 unique values and 0% NA
## new variable 'conf_int' (list) with 19 unique values and 0% NA
## new variable 'lower' (double) with 19 unique values and 0% NA
## new variable 'upper' (double) with 19 unique values and 0% NA
## select: dropped 2 variables (res, conf_int)
conf_ints
## # A tibble: 19 × 4
## label cor lower upper
## <chr> <dbl> <dbl> <dbl>
## 1 "Publishing in highly regarded journals or conferences " 0.208 0.00600 0.394
## 2 "Generating funding" 0.240 0.0395 0.422
## 3 "Being collegial, helpful and respectful" 0.291 0.0913 0.468
## 4 "Leading projects" 0.310 0.116 0.482
## 5 "Generating high-quality publications" 0.316 0.121 0.488
## 6 "Contributing to peer review" 0.329 0.134 0.499
## 7 "Mentoring PhDs and postdocs" 0.371 0.180 0.534
## 8 "Engaging with the public " 0.372 0.182 0.535
## 9 "Giving invited talks and keynotes" 0.380 0.192 0.541
## 10 "Receiving awards" 0.397 0.210 0.557
## 11 "Publishing a large number of research articles" 0.407 0.224 0.563
## 12 "Generating a large number of citations" 0.418 0.234 0.573
## 13 "Networking activities " 0.423 0.241 0.576
## 14 "Openly sharing research code or creating research softw… 0.429 0.230 0.595
## 15 "Openly sharing research data" 0.434 0.247 0.590
## 16 "Engaging policy makers" 0.451 0.267 0.604
## 17 "Openly sharing research articles " 0.464 0.287 0.610
## 18 "Creating intellectual property " 0.476 0.290 0.627
## 19 "Developing industry collaborations" 0.478 0.300 0.624
conf_ints %>%
ggplot(aes(cor, fct_reorder(label, cor))) +
geom_linerange(aes(xmin = lower, xmax = upper)) +
geom_point(size = 2, colour = custom_blue) +
labs(x = "Correlation between personal and perceived institutional view",
y = NULL)
There is big uncertainty in the estimates, and the linear approach inherent with correlations might not be the right model. Following this approach, it seems that for some questions personal and institutional views are more aligned than for others. This does not follow the exact same pattern as with the above comparison.
To explore a little further how these values come about.
Now do mean differences instead of difference of means as in original analysis.
cor_data %>%
summarise(diff = mean(personal - institutional, na.rm = TRUE)) %>%
arrange(diff) %>%
mutate(diff = round(diff, 3)) %>%
knitr::kable()
## summarise: now 19 rows and 2 columns, ungrouped
## mutate: changed 19 values (100%) of 'diff' (0 new NA)
| label | diff |
|---|---|
| Openly sharing research data | -1.107 |
| Being collegial, helpful and respectful | -1.049 |
| Contributing to peer review | -1.048 |
| Openly sharing research code or creating research software | -1.006 |
| Openly sharing research articles | -0.856 |
| Engaging with the public | -0.492 |
| Engaging policy makers | -0.463 |
| Networking activities | -0.358 |
| Mentoring PhDs and postdocs | -0.326 |
| Generating high-quality publications | -0.307 |
| Giving invited talks and keynotes | -0.168 |
| Developing industry collaborations | -0.022 |
| Generating a large number of citations | -0.021 |
| Creating intellectual property | 0.006 |
| Leading projects | 0.021 |
| Publishing in highly regarded journals or conferences | 0.347 |
| Receiving awards | 0.355 |
| Publishing a large number of research articles | 0.450 |
| Generating funding | 0.563 |
This is not identical but very close to our original figure.
cor_data %>%
filter(str_detect(label, "Openly.*")) %>%
tabyl(personal, institutional) %>%
custom_adorns() %>%
knitr::kable()
## filter (grouped): removed 3,146 rows (85%), 574 rows remaining
| personal | 1 | 2 | 3 | 4 | 5 | NA_ | Total |
|---|---|---|---|---|---|---|---|
| 1 | 24.2% (46) | 24.7% (47) | 23.7% (45) | 11.6% (22) | 8.4% (16) | 7.4% (14) | 100.0% (190) |
| 2 | 4.6% (9) | 27.4% (54) | 36.5% (72) | 8.1% (16) | 18.3% (36) | 5.1% (10) | 100.0% (197) |
| 3 | 1.7% (2) | 5.1% (6) | 45.8% (54) | 22.0% (26) | 16.9% (20) | 8.5% (10) | 100.0% (118) |
| 4 | 0.0% (0) | 3.7% (1) | 18.5% (5) | 25.9% (7) | 37.0% (10) | 14.8% (4) | 100.0% (27) |
| 5 | 0.0% (0) | 0.0% (0) | 12.5% (4) | 18.8% (6) | 56.2% (18) | 12.5% (4) | 100.0% (32) |
| NA | 0.0% (0) | 0.0% (0) | 30.0% (3) | 10.0% (1) | 60.0% (6) | 0.0% (0) | 100.0% (10) |
| Total | 9.9% (57) | 18.8% (108) | 31.9% (183) | 13.6% (78) | 18.5% (106) | 7.3% (42) | 100.0% (574) |
cor_data %>%
filter(str_detect(label, "highly regarded")) %>%
tabyl(personal, institutional) %>%
custom_adorns() %>%
knitr::kable()
## filter (grouped): removed 3,522 rows (95%), 198 rows remaining
| personal | 1 | 2 | 3 | 4 | 5 | NA_ | Total |
|---|---|---|---|---|---|---|---|
| 1 | 67.6% (48) | 15.5% (11) | 8.5% (6) | 2.8% (2) | 1.4% (1) | 4.2% (3) | 100.0% (71) |
| 2 | 53.7% (44) | 32.9% (27) | 4.9% (4) | 2.4% (2) | 1.2% (1) | 4.9% (4) | 100.0% (82) |
| 3 | 29.2% (7) | 41.7% (10) | 8.3% (2) | 12.5% (3) | 8.3% (2) | 0.0% (0) | 100.0% (24) |
| 4 | 33.3% (4) | 16.7% (2) | 25.0% (3) | 16.7% (2) | 0.0% (0) | 8.3% (1) | 100.0% (12) |
| 5 | 44.4% (4) | 44.4% (4) | 11.1% (1) | 0.0% (0) | 0.0% (0) | 0.0% (0) | 100.0% (9) |
| Total | 54.0% (107) | 27.3% (54) | 8.1% (16) | 4.5% (9) | 2.0% (4) | 4.0% (8) | 100.0% (198) |
cor_data %>%
filter(str_detect(label, "code")) %>%
tabyl(personal, institutional) %>%
custom_adorns() %>%
knitr::kable()
## filter (grouped): removed 3,538 rows (95%), 182 rows remaining
| personal | 1 | 2 | 3 | 4 | 5 | NA_ | Total |
|---|---|---|---|---|---|---|---|
| 1 | 20.4% (11) | 27.8% (15) | 24.1% (13) | 13.0% (7) | 7.4% (4) | 7.4% (4) | 100.0% (54) |
| 2 | 1.6% (1) | 30.6% (19) | 29.0% (18) | 9.7% (6) | 19.4% (12) | 9.7% (6) | 100.0% (62) |
| 3 | 2.4% (1) | 2.4% (1) | 43.9% (18) | 19.5% (8) | 19.5% (8) | 12.2% (5) | 100.0% (41) |
| 4 | 0.0% (0) | 12.5% (1) | 12.5% (1) | 25.0% (2) | 37.5% (3) | 12.5% (1) | 100.0% (8) |
| 5 | 0.0% (0) | 0.0% (0) | 11.1% (1) | 33.3% (3) | 55.6% (5) | 0.0% (0) | 100.0% (9) |
| NA | 0.0% (0) | 0.0% (0) | 25.0% (2) | 12.5% (1) | 62.5% (5) | 0.0% (0) | 100.0% (8) |
| Total | 7.1% (13) | 19.8% (36) | 29.1% (53) | 14.8% (27) | 20.3% (37) | 8.8% (16) | 100.0% (182) |
all_Values <- bind_rows(institutional_values, personal_values) %>%
mutate(label = str_remove(label, "\\s$")) %>%
left_join(expanded_labels) %>%
rename(id = X1)
## mutate: changed 1,897 values (26%) of 'label' (0 new NA)
## Joining, by = "label"
## left_join: added one column (group)
## > rows only in x 0
## > rows only in y ( 0)
## > matched rows 7,218
## > =======
## > rows total 7,218
## rename: renamed one variable (id)
all_Values
## # A tibble: 7,218 × 6
## id val num_val source label group
## <dbl> <chr> <int> <chr> <chr> <chr>
## 1 10 Very important 1 institutional Publishi… Trad…
## 2 10 Very important 1 institutional Generati… Trad…
## 3 10 Very important 1 institutional Generati… Alte…
## 4 10 Somewhat important 2 institutional Publishi… Trad…
## 5 10 Neither important nor unimportant 3 institutional Openly s… Alte…
## 6 10 Neither important nor unimportant 3 institutional Openly s… Alte…
## 7 10 Neither important nor unimportant 3 institutional Openly s… Alte…
## 8 10 Neither important nor unimportant 3 institutional Engaging… Soci…
## 9 10 Somewhat important 2 institutional Developi… Soci…
## 10 10 Somewhat important 2 institutional Creating… Soci…
## # … with 7,208 more rows
scores_by_individual <- all_Values %>%
group_by(id, group, source) %>%
summarise(score = mean(num_val))
## group_by: 3 grouping variables (id, group, source)
## summarise: now 1,947 rows and 4 columns, 2 group variables remaining (id, group)
demographics <- df %>%
select(id = X1, disciplines_recoded_wos, acad_age = X87, gender = X84,
acad_rol = X85) %>%
mutate(# fix two cases where data was entered incorrectly
acad_age = case_when(acad_age == 19999 ~ 1999,
acad_age == 84 ~ 1984,
TRUE ~ acad_age),
acad_age = 2021 - acad_age)
## select: renamed 4 variables (id, acad_age, gender, acad_rol) and dropped 68 variables
## mutate: changed 198 values (100%) of 'acad_age' (0 new NA)
demographics %>%
ggplot(aes(acad_age)) +
geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
scores_by_individual <- scores_by_individual %>%
left_join(demographics)
## Joining, by = "id"
## left_join: added 4 columns (disciplines_recoded_wos, acad_age, gender,
## acad_rol)
## > rows only in x 0
## > rows only in y ( 0)
## > matched rows 1,947
## > =======
## > rows total 1,947
scores_by_individual
## # A tibble: 1,947 × 8
## # Groups: id, group [990]
## id group source score disci…¹ acad_…² gender acad_…³
## <dbl> <chr> <chr> <dbl> <chr> <dbl> <chr> <chr>
## 1 10 Academic leadership insti… 1.5 Physic… 36 Male Profes…
## 2 10 Academic leadership perso… 1.75 Physic… 36 Male Profes…
## 3 10 Alternative academic impact insti… 2.5 Physic… 36 Male Profes…
## 4 10 Alternative academic impact perso… 2.5 Physic… 36 Male Profes…
## 5 10 Community activities insti… 3.25 Physic… 36 Male Profes…
## 6 10 Community activities perso… 2.75 Physic… 36 Male Profes…
## 7 10 Societal & economic impact insti… 2.5 Physic… 36 Male Profes…
## 8 10 Societal & economic impact perso… 2.5 Physic… 36 Male Profes…
## 9 10 Traditional academic impact insti… 1.33 Physic… 36 Male Profes…
## 10 10 Traditional academic impact perso… 1.33 Physic… 36 Male Profes…
## # … with 1,937 more rows, and abbreviated variable names
## # ¹disciplines_recoded_wos, ²acad_age, ³acad_rol
scores_by_individual %>%
mutate(group = str_wrap(group, width = 12)) %>%
filter(!is.na(disciplines_recoded_wos)) %>%
ggplot(aes(score, disciplines_recoded_wos)) +
geom_boxplot(width = .7, outlier.alpha = 0, notch = TRUE,
fill = custom_blue) +
#geom_jitter(height = .1, width = .01, colour = custom_blue, alpha = .5) +
five_point_scale(compact = TRUE) +
facet_grid(rows = vars(group), cols = vars(source)) +
labs(y = NULL) +
theme(panel.spacing.x = unit(4, "lines"))
## mutate (grouped): changed 1,947 values (100%) of 'group' (0 new NA)
## filter (grouped): removed 10 rows (1%), 1,937 rows remaining
## Notch went outside hinges
## ℹ Do you want `notch = FALSE`?
## Notch went outside hinges
## ℹ Do you want `notch = FALSE`?
## Notch went outside hinges
## ℹ Do you want `notch = FALSE`?
## Notch went outside hinges
## ℹ Do you want `notch = FALSE`?
## Notch went outside hinges
## ℹ Do you want `notch = FALSE`?
## Notch went outside hinges
## ℹ Do you want `notch = FALSE`?
## Notch went outside hinges
## ℹ Do you want `notch = FALSE`?
## Notch went outside hinges
## ℹ Do you want `notch = FALSE`?
## Notch went outside hinges
## ℹ Do you want `notch = FALSE`?
## Notch went outside hinges
## ℹ Do you want `notch = FALSE`?
## Notch went outside hinges
## ℹ Do you want `notch = FALSE`?
## Notch went outside hinges
## ℹ Do you want `notch = FALSE`?
## Notch went outside hinges
## ℹ Do you want `notch = FALSE`?
## Notch went outside hinges
## ℹ Do you want `notch = FALSE`?
## Notch went outside hinges
## ℹ Do you want `notch = FALSE`?
No substantive differences. Sample size is too small to allow drawing conclusions here, since precisions would need to be much higher, given that we are conducting a large number of comparisons at once.
scores_by_individual %>%
mutate(group = str_wrap(group, width = 12)) %>%
filter(!is.na(gender), gender != "Prefer not to say") %>%
ggplot(aes(score, gender)) +
geom_boxplot(width = .7, outlier.alpha = 0, notch = TRUE) +
geom_jitter(height = .05, width = .01, colour = custom_blue, alpha = .2) +
facet_grid(rows = vars(group), cols = vars(source)) +
five_point_scale(compact = TRUE) +
theme(panel.spacing.x = unit(4, "lines")) +
labs(y = NULL, x = "Aggregated score across indicator group")
## mutate (grouped): changed 1,947 values (100%) of 'group' (0 new NA)
## filter (grouped): removed 20 rows (1%), 1,927 rows remaining
There are no differences regarding the institutional side. Regarding personal views, women tend to give higher importance to community activities and societal and economic impact than men, all other aspects being similar.
This finding should be backed up by checking the gender distribution across disciplines. Life sci & Biomed also has a higher score in these two items, although only slightly.
library(tidylog)
tab <- demographics %>%
filter(!is.na(disciplines_recoded_wos), gender != "Prefer not to say") %>%
tabyl(disciplines_recoded_wos, gender)
## filter: removed 3 rows (2%), 195 rows remaining
tab %>%
chisq.test()
##
## Pearson's Chi-squared test
##
## data: .
## X-squared = 18.055, df = 4, p-value = 0.001204
tab %>%
custom_adorns() %>%
knitr::kable()
| disciplines_recoded_wos | Female | Male | Total |
|---|---|---|---|
| Arts & Humanities | 28.6% (2) | 71.4% (5) | 100.0% (7) |
| Life Sciences & Biomedicine | 42.5% (37) | 57.5% (50) | 100.0% (87) |
| Physical Sciences | 18.5% (5) | 81.5% (22) | 100.0% (27) |
| Social Sciences | 25.7% (9) | 74.3% (26) | 100.0% (35) |
| Technology | 7.7% (3) | 92.3% (36) | 100.0% (39) |
| Total | 28.7% (56) | 71.3% (139) | 100.0% (195) |
Women are much better represented in Life Sci & Biomed.
It would therefore seem, that the higher higher importance to community activities and societal and economic impact can still be attributed to gender, and not discipline.
In causal terms, gender would have an impact on scores, as well as on discipline, but discipline only on scores, and not on gender.
To assess the total causal effect of gender, we therefore do not need to control for gender. For the partial effect of gender alone, we would need to control for discipline.
model_data <- scores_by_individual %>%
filter(!is.na(gender), gender != "Prefer not to say") %>%
filter(source == "personal", group == "Societal & economic impact")
m <- aov(score ~ gender, data = model_data)
summary(m)
## Df Sum Sq Mean Sq F value Pr(>F)
## gender 1 2.19 2.1945 2.893 0.0906 .
## Residuals 194 147.15 0.7585
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
m2 <- aov(score ~ gender + disciplines_recoded_wos, data = model_data)
summary(m2)
## Df Sum Sq Mean Sq F value Pr(>F)
## gender 1 2.03 2.0301 2.671 0.104
## disciplines_recoded_wos 4 2.03 0.5087 0.669 0.614
## Residuals 189 143.64 0.7600
## 1 observation deleted due to missingness
model_data <- scores_by_individual %>%
filter(!is.na(gender), gender != "Prefer not to say") %>%
filter(source == "personal", group == "Community activities")
m <- aov(score ~ gender, data = model_data)
summary(m)
## Df Sum Sq Mean Sq F value Pr(>F)
## gender 1 0.81 0.8095 1.822 0.179
## Residuals 194 86.17 0.4442
m2 <- aov(score ~ gender + disciplines_recoded_wos, data = model_data)
summary(m2)
## Df Sum Sq Mean Sq F value Pr(>F)
## gender 1 0.84 0.8402 1.854 0.175
## disciplines_recoded_wos 4 0.35 0.0885 0.195 0.941
## Residuals 189 85.66 0.4533
## 1 observation deleted due to missingness
In both cases, differences between genders regarding the personal view on community activities and societal & economic impact are not statistically significant. This does not change (in either direction) when also controlling for gender. Overall, however, this method is not well suited to the data, given that the data are in the interval [0, 1], but the method (ANOVA) assumes normally distributed error terms, which is not the case here.
scores_by_individual %>%
ungroup() %>%
mutate(country_lumped = fct_lump_n(country, 3)) %>%
mutate(group = str_wrap(group, width = 12)) %>%
ggplot(aes(score, country_lumped)) +
geom_boxplot(width = .7, outlier.alpha = 0, notch = TRUE) +
geom_jitter(height = .05, width = .01, colour = custom_blue, alpha = .2) +
facet_grid(rows = vars(group), cols = vars(source)) +
five_point_scale(compact = TRUE) +
theme(panel.spacing.x = unit(4, "lines")) +
labs(y = NULL, x = "Aggregated score across indicator group")
This comparison is again fraught with difficulties, since there are numerous pairs which can be compared. The main difference that stands out is the relatively low perceived institutional relevance that respondents from the USA report for societal & economic impact. It is hard to draw any further conclusions, given the small sample size.
We can test this specific table, but would need to reduce the significance threshold at least by the factor 10 (5% -> 0.5%).
model_data <- scores_by_individual %>%
ungroup() %>%
mutate(country_lumped = fct_lump_n(country, 3)) %>%
filter(source == "institutional", group == "Societal & economic impact")
m <- aov(score ~ country_lumped, data = model_data)
summary(m)
As such, the p-value is not small enough to meet this threshold. So the results are still not strong enough to discard the assumption that all means in this group might be similar.
scores_by_individual %>%
group_by(source, group) %>%
summarise(res = list(cor.test(acad_age, score, conf.level = .995))) %>%
mutate(cor = map_dbl(res, pluck, "estimate"),
conf_int = map(res, pluck, "conf.int"),
lower = map_dbl(conf_int, pluck, 1),
upper = map_dbl(conf_int, pluck, 2)) %>%
select(source, group, cor, lower, upper) %>%
arrange(source, desc(cor)) %>%
mutate(across(where(is.numeric), round, digits = 2)) %>%
knitr::kable()
| source | group | cor | lower | upper |
|---|---|---|---|---|
| institutional | Alternative academic impact | 0.09 | -0.11 | 0.29 |
| institutional | Societal & economic impact | 0.06 | -0.15 | 0.26 |
| institutional | Community activities | 0.06 | -0.15 | 0.26 |
| institutional | Academic leadership | 0.03 | -0.17 | 0.23 |
| institutional | Traditional academic impact | -0.01 | -0.21 | 0.19 |
| personal | Alternative academic impact | 0.15 | -0.05 | 0.34 |
| personal | Societal & economic impact | 0.02 | -0.18 | 0.22 |
| personal | Academic leadership | 0.01 | -0.19 | 0.21 |
| personal | Community activities | 0.01 | -0.19 | 0.20 |
| personal | Traditional academic impact | -0.09 | -0.29 | 0.11 |
There are no substantive correlations. The only one which could be claimed as being weak, is in the personal view on alternative academic impact.
scores_by_individual %>%
ggplot(aes(acad_age, score)) +
geom_jitter(height = .1, width = .5, alpha = .5) +
geom_smooth(colour = custom_blue) +
facet_grid(rows = vars(group),
cols = vars(source)) +
scale_y_continuous(
breaks = c(1, 3, 5),
labels = c("Very important", "Neither/nor", "Very unimportant")
) +
labs(x = "Academic age", y = NULL)
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
The low linear relationship is confirmed via the plot. However, there seems to be some inverse U-shaped pattern, where older and younger respondents give higher weight to some of the concepts, while middle aged respondents (academic age of 15-30 years) give them lower weight. However, these trends are not present in all cases and overall not very strong.