library(table.glue) library(dplyr, warn.conflicts = FALSE) library(tidyr, warn.conflicts = FALSE) library(tibble) library(gt, warn.conflicts = FALSE)
This vignette shows how to go from data to table using table.glue
and friends. For this vignette, we will analyze data from The US National Health and Nutrition Examination Survey (NHANES), 2013-2018.
NHANES was designed to assess the health and nutritional status of the non-institutionalized US population and was conducted by the National Center for Health Statistics of the Centers for Disease Control and Prevention. Since 1999-2000, NHANES has been conducted in two-year cycles using a multistage probability sampling design to select participants. Each cycle is independent with different participants recruited.
The same protocol was followed to measure systolic and diastolic blood pressure (SBP and DBP) in each NHANES cycle. After survey participants had rested 5 minutes, their BP was measured by a trained physician using a mercury sphygmomanometer and an appropriately sized cuff. Three BP measurements were obtained at 30 second intervals.
# see ?table.glue::nhanes for description of all columns glimpse(nhanes) #> Rows: 29,400 #> Columns: 17 #> $ exam <chr> "exam_2013_2014", "exam_2013_2014", "exam_2013_2014"... #> $ seqn <dbl> 73557, 73558, 73559, 73560, 73561, 73562, 73563, 735... #> $ psu <dbl> 1, 1, 1, 2, 2, 1, 1, 1, 2, 1, 2, 2, 1, 2, 1, 2, 2, 2... #> $ strata <dbl> 112, 108, 109, 109, 116, 111, 105, 114, 106, 112, 11... #> $ wts_mec_2yr <dbl> 13481.042, 24471.770, 57193.285, 55766.512, 65541.87... #> $ exam_status <chr> "interview and exam", "interview and exam", "intervi... #> $ age <dbl> 69, 54, 72, 9, 73, 56, 0, 61, 42, 56, 65, 26, 0, 9, ... #> $ sex <chr> "male", "male", "male", "male", "female", "male", "m... #> $ race_ethnicity <chr> "black", "white", "white", "white", "white", "hispan... #> $ education <fct> highschool_or_some_college, highschool_or_some_colle... #> $ pregnant <chr> "no", "no", "no", "no", NA, "no", "no", NA, "no", NA... #> $ bp_sys_mmhg <dbl> 112.66667, 157.33333, 142.00000, 104.66667, 137.3333... #> $ bp_dia_mmhg <dbl> 74.00000, 61.33333, 82.00000, 36.66667, 86.66667, 82... #> $ n_msr_sbp <int> 3, 3, 3, 3, 3, 3, 0, 3, NA, 3, 3, 3, NA, 3, 3, 3, 3,... #> $ n_msr_dbp <int> 3, 3, 3, 3, 3, 3, 0, 3, NA, 3, 3, 3, NA, 3, 3, 3, 3,... #> $ bp_high_aware <chr> "yes", "yes", "yes", NA, "yes", "yes", NA, "yes", "n... #> $ meds_bp <chr> "no", "no", "yes", NA, "yes", "yes", NA, "no", "no",...
We will make a table showing SBP and DBP for subgroups based on sex and race over the three NHANES cycles in nhanes
. But first need to filter to include rows where the survey participant
completed both the NHANES interview and exam
is 20 years or older
is not pregnant
is currently using medication to lower their BP
does not have missing data for SBP or DBP
nhanes_analysis <- nhanes %>% filter(exam_status == 'interview and exam', age >= 20, pregnant == 'no' | is.na(pregnant), meds_bp == 'yes') %>% select(exam, sex, race_ethnicity, bp_sys_mmhg, bp_dia_mmhg) %>% na.omit() nhanes_analysis #> # A tibble: 4,412 x 5 #> exam sex race_ethnicity bp_sys_mmhg bp_dia_mmhg #> <chr> <chr> <chr> <dbl> <dbl> #> 1 exam_2013_2014 male white 142 82 #> 2 exam_2013_2014 female white 137. 86.7 #> 3 exam_2013_2014 male hispanic 157. 82 #> 4 exam_2013_2014 male white 127. 66.7 #> 5 exam_2013_2014 female black 127. 74 #> 6 exam_2013_2014 female other 141. 57.3 #> 7 exam_2013_2014 female black 160 71.3 #> 8 exam_2013_2014 male white 109. 53.3 #> 9 exam_2013_2014 female white 140 82.7 #> 10 exam_2013_2014 female other 140 50 #> # ... with 4,402 more rows
For each NHANES exam and sex/race/ethnicity group, we would like to compute the mean and standard deviation of BP. We’ll use dplyr
for this.
nhanes_bp_summary <- nhanes_analysis %>% group_by(exam, sex, race_ethnicity) %>% summarize( across( .cols = c(bp_sys_mmhg, bp_dia_mmhg), .fns = list( mean = ~ mean(.x), sd = ~ sd(.x) ) ) ) #> `summarise()` regrouping output by 'exam', 'sex' (override with `.groups` argument) nhanes_bp_summary #> # A tibble: 30 x 7 #> # Groups: exam, sex [6] #> exam sex race_ethnicity bp_sys_mmhg_mean bp_sys_mmhg_sd bp_dia_mmhg_mean #> <chr> <chr> <chr> <dbl> <dbl> <dbl> #> 1 exam~ fema~ asian 134. 16.1 73.1 #> 2 exam~ fema~ black 137. 19.8 70.1 #> 3 exam~ fema~ hispanic 134. 23.1 69.9 #> 4 exam~ fema~ other 133. 21.2 70.7 #> 5 exam~ fema~ white 131. 19.5 67.1 #> 6 exam~ male asian 134. 14.7 73.2 #> 7 exam~ male black 133. 19.0 72.6 #> 8 exam~ male hispanic 133 17.2 71.7 #> 9 exam~ male other 132. 15.9 72.6 #> 10 exam~ male white 129. 20.2 69.6 #> # ... with 20 more rows, and 1 more variable: bp_dia_mmhg_sd <dbl>
With our summary data, we can now create the data that will be passed to our table. This is where table.glue
comes into the picture.
rspec <- round_spec() %>% round_half_even() %>% round_using_magnitude( digits = c(2, 1, 1, 0), breaks = c(1, 10, 100, Inf) ) nhanes_bp_strings <- ungroup(nhanes_bp_summary) %>% transmute( exam, sex, race_ethnicity, sbp = table_glue("{bp_sys_mmhg_mean} ({bp_sys_mmhg_sd})", rspec=rspec), dbp = table_glue("{bp_dia_mmhg_mean} ({bp_dia_mmhg_sd})", rspec=rspec) )
Making a table out of nhanes_tbl_strings
is fairly straightforward, but does require a fair bit of code to recode variables and label things.
nhanes_bp_table <- nhanes_bp_strings %>% mutate( race_ethnicity = recode( race_ethnicity, 'hispanic' = "Mexican American", 'asian' = "Non-Hispanic Asian", 'black' = "Non-Hispanic Black", 'white' = "Non-Hispanic White", 'other' = "Other Race - Including Multi-Racial" ), sex = recode( sex, 'female' = 'Female survey participants', 'male' = 'Male survey participants' ), exam = recode( exam, 'exam_2013_2014' = '2013-2014', 'exam_2015_2016' = '2015-2016', 'exam_2017_2018' = '2017-2018' ) ) %>% pivot_wider(names_from = 'exam', values_from = c('sbp', 'dbp')) %>% gt(rowname_col = 'race_ethnicity', groupname_col = 'sex') %>% cols_label( "sbp_2013-2014" = "2013-2014", "sbp_2015-2016" = "2015-2016", "sbp_2017-2018" = "2017-2018", "dbp_2013-2014" = "2013-2014", "dbp_2015-2016" = "2015-2016", "dbp_2017-2018" = "2017-2018" ) %>% cols_align('center') %>% tab_stubhead(label = 'Race / ethnicity') %>% tab_spanner(label = 'Systolic BP, mm Hg', columns = c("sbp_2013-2014", "sbp_2015-2016", "sbp_2017-2018")) %>% tab_spanner(label = 'Diastolic BP, mm Hg', columns = c("dbp_2013-2014", "dbp_2015-2016", "dbp_2017-2018")) %>% tab_header( title = "Systolic and diastolic blood pressure for female and male participants in the US National Health and Nutrition Examination Survey from 2013 - 2018", subtitle = "Table values are mean (standard deviation)" ) nhanes_bp_table
Systolic and diastolic blood pressure for female and male participants in the US National Health and Nutrition Examination Survey from 2013 - 2018 | ||||||
---|---|---|---|---|---|---|
Table values are mean (standard deviation) | ||||||
Race / ethnicity | Systolic BP, mm Hg | Diastolic BP, mm Hg | ||||
2013-2014 | 2015-2016 | 2017-2018 | 2013-2014 | 2015-2016 | 2017-2018 | |
Female survey participants | ||||||
Non-Hispanic Asian | 134 (16.1) | 137 (17.9) | 139 (19.0) | 73.1 (10.7) | 72.3 (12.2) | 74.4 (10.3) |
Non-Hispanic Black | 137 (19.8) | 136 (19.1) | 141 (22.0) | 70.1 (13.3) | 71.0 (15.1) | 73.9 (12.7) |
Mexican American | 134 (23.1) | 133 (16.8) | 142 (19.7) | 69.9 (13.3) | 67.7 (10.8) | 72.4 (11.5) |
Other Race - Including Multi-Racial | 133 (21.2) | 137 (21.9) | 139 (21.2) | 70.7 (11.5) | 67.3 (12.3) | 71.3 (13.6) |
Non-Hispanic White | 131 (19.5) | 135 (21.8) | 137 (21.3) | 67.1 (11.8) | 67.9 (11.7) | 69.1 (15.0) |
Male survey participants | ||||||
Non-Hispanic Asian | 134 (14.7) | 137 (20.4) | 133 (21.4) | 73.2 (11.3) | 71.9 (13.3) | 74.3 (13.1) |
Non-Hispanic Black | 133 (19.0) | 139 (21.0) | 137 (20.2) | 72.6 (12.8) | 74.1 (13.3) | 77.6 (12.5) |
Mexican American | 133 (17.2) | 136 (16.8) | 134 (19.9) | 71.7 (12.1) | 68.9 (12.5) | 73.2 (11.2) |
Other Race - Including Multi-Racial | 132 (15.9) | 133 (15.5) | 133 (19.1) | 72.6 (11.6) | 71.3 (13.2) | 76.9 (12.5) |
Non-Hispanic White | 129 (20.2) | 132 (19.1) | 133 (18.0) | 69.6 (13.3) | 68.6 (12.0) | 71.4 (11.7) |
Now we just need to summarize the table, which can be done using an inline table object. Just like the table, our inline object is created using nhanes_bp_strings
nhanes_bp_inline <- nhanes_bp_strings %>% as_inline(tbl_variables = c('exam', 'sex', 'race_ethnicity'), tbl_value = c('sbp', 'dbp')) # Now you can freely access any cell in the table nhanes_bp_inline$exam_2013_2014$male$black$sbp #> [1] "133 (19.0)" nhanes_bp_inline$exam_2017_2018$male$black$sbp #> [1] "137 (20.2)"
This part is easy: “Among survey participants who were black and female, the mean (standard deviation) of systolic blood pressure increased from 137 (19.8) in 2013-2014 to 141 (22.0) in 2017-2018.”
But what if you don’t want to report the standard deviations? These are helpful in the table but less so in the text. To get rid of these, you can use the bracket helper functions, e.g., bracket_drop()
and bracket_extract()
.
# Drop the bracket bracket_drop(nhanes_bp_inline$exam_2013_2014$male$black$sbp) #> [1] "133" # Get the bracket bracket_extract(nhanes_bp_inline$exam_2017_2018$male$black$sbp) #> [1] "(20.2)"
Let’s re-write our sentence more directly: “Among survey participants who were black and female, the mean systolic blood pressure increased from 137 mm Hg in 2013-2014 to 141 mm Hg in 2017-2018.”