library(table.glue)
library(dplyr, warn.conflicts = FALSE)
library(tidyr, warn.conflicts = FALSE)
library(tibble)
library(gt, warn.conflicts = FALSE)

Overview

This vignette shows how to go from data to table using table.glue and friends. For this vignette, we will analyze data from The US National Health and Nutrition Examination Survey (NHANES), 2013-2018.

NHANES was designed to assess the health and nutritional status of the non-institutionalized US population and was conducted by the National Center for Health Statistics of the Centers for Disease Control and Prevention. Since 1999-2000, NHANES has been conducted in two-year cycles using a multistage probability sampling design to select participants. Each cycle is independent with different participants recruited.

NHANES Blood pressure measurements

The same protocol was followed to measure systolic and diastolic blood pressure (SBP and DBP) in each NHANES cycle. After survey participants had rested 5 minutes, their BP was measured by a trained physician using a mercury sphygmomanometer and an appropriately sized cuff. Three BP measurements were obtained at 30 second intervals.


# see ?table.glue::nhanes for description of all columns

glimpse(nhanes)
#> Rows: 29,400
#> Columns: 17
#> $ exam           <chr> "exam_2013_2014", "exam_2013_2014", "exam_2013_2014"...
#> $ seqn           <dbl> 73557, 73558, 73559, 73560, 73561, 73562, 73563, 735...
#> $ psu            <dbl> 1, 1, 1, 2, 2, 1, 1, 1, 2, 1, 2, 2, 1, 2, 1, 2, 2, 2...
#> $ strata         <dbl> 112, 108, 109, 109, 116, 111, 105, 114, 106, 112, 11...
#> $ wts_mec_2yr    <dbl> 13481.042, 24471.770, 57193.285, 55766.512, 65541.87...
#> $ exam_status    <chr> "interview and exam", "interview and exam", "intervi...
#> $ age            <dbl> 69, 54, 72, 9, 73, 56, 0, 61, 42, 56, 65, 26, 0, 9, ...
#> $ sex            <chr> "male", "male", "male", "male", "female", "male", "m...
#> $ race_ethnicity <chr> "black", "white", "white", "white", "white", "hispan...
#> $ education      <fct> highschool_or_some_college, highschool_or_some_colle...
#> $ pregnant       <chr> "no", "no", "no", "no", NA, "no", "no", NA, "no", NA...
#> $ bp_sys_mmhg    <dbl> 112.66667, 157.33333, 142.00000, 104.66667, 137.3333...
#> $ bp_dia_mmhg    <dbl> 74.00000, 61.33333, 82.00000, 36.66667, 86.66667, 82...
#> $ n_msr_sbp      <int> 3, 3, 3, 3, 3, 3, 0, 3, NA, 3, 3, 3, NA, 3, 3, 3, 3,...
#> $ n_msr_dbp      <int> 3, 3, 3, 3, 3, 3, 0, 3, NA, 3, 3, 3, NA, 3, 3, 3, 3,...
#> $ bp_high_aware  <chr> "yes", "yes", "yes", NA, "yes", "yes", NA, "yes", "n...
#> $ meds_bp        <chr> "no", "no", "yes", NA, "yes", "yes", NA, "no", "no",...

Making the analysis data

We will make a table showing SBP and DBP for subgroups based on sex and race over the three NHANES cycles in nhanes. But first need to filter to include rows where the survey participant

  • completed both the NHANES interview and exam

  • is 20 years or older

  • is not pregnant

  • is currently using medication to lower their BP

  • does not have missing data for SBP or DBP


nhanes_analysis <- nhanes %>% 
  filter(exam_status == 'interview and exam', 
         age >= 20, 
         pregnant == 'no' | is.na(pregnant),
         meds_bp == 'yes') %>% 
  select(exam, sex, race_ethnicity, bp_sys_mmhg, bp_dia_mmhg) %>% 
  na.omit()

nhanes_analysis
#> # A tibble: 4,412 x 5
#>    exam           sex    race_ethnicity bp_sys_mmhg bp_dia_mmhg
#>    <chr>          <chr>  <chr>                <dbl>       <dbl>
#>  1 exam_2013_2014 male   white                 142         82  
#>  2 exam_2013_2014 female white                 137.        86.7
#>  3 exam_2013_2014 male   hispanic              157.        82  
#>  4 exam_2013_2014 male   white                 127.        66.7
#>  5 exam_2013_2014 female black                 127.        74  
#>  6 exam_2013_2014 female other                 141.        57.3
#>  7 exam_2013_2014 female black                 160         71.3
#>  8 exam_2013_2014 male   white                 109.        53.3
#>  9 exam_2013_2014 female white                 140         82.7
#> 10 exam_2013_2014 female other                 140         50  
#> # ... with 4,402 more rows

Making the summary data

For each NHANES exam and sex/race/ethnicity group, we would like to compute the mean and standard deviation of BP. We’ll use dplyr for this.


nhanes_bp_summary <- nhanes_analysis %>% 
  group_by(exam, sex, race_ethnicity) %>% 
  summarize(
    across(
      .cols = c(bp_sys_mmhg, bp_dia_mmhg),
      .fns = list(
        mean = ~ mean(.x),
        sd = ~ sd(.x)
      )
    )
  )
#> `summarise()` regrouping output by 'exam', 'sex' (override with `.groups` argument)

nhanes_bp_summary
#> # A tibble: 30 x 7
#> # Groups:   exam, sex [6]
#>    exam  sex   race_ethnicity bp_sys_mmhg_mean bp_sys_mmhg_sd bp_dia_mmhg_mean
#>    <chr> <chr> <chr>                     <dbl>          <dbl>            <dbl>
#>  1 exam~ fema~ asian                      134.           16.1             73.1
#>  2 exam~ fema~ black                      137.           19.8             70.1
#>  3 exam~ fema~ hispanic                   134.           23.1             69.9
#>  4 exam~ fema~ other                      133.           21.2             70.7
#>  5 exam~ fema~ white                      131.           19.5             67.1
#>  6 exam~ male  asian                      134.           14.7             73.2
#>  7 exam~ male  black                      133.           19.0             72.6
#>  8 exam~ male  hispanic                   133            17.2             71.7
#>  9 exam~ male  other                      132.           15.9             72.6
#> 10 exam~ male  white                      129.           20.2             69.6
#> # ... with 20 more rows, and 1 more variable: bp_dia_mmhg_sd <dbl>

Making the summary strings

With our summary data, we can now create the data that will be passed to our table. This is where table.glue comes into the picture.


rspec <- round_spec() %>% 
  round_half_even() %>% 
  round_using_magnitude(
    digits = c(2, 1,  1,   0),
    breaks = c(1, 10, 100, Inf)
  )

nhanes_bp_strings <- ungroup(nhanes_bp_summary) %>% 
  transmute(
    exam,
    sex, 
    race_ethnicity,
    sbp = table_glue("{bp_sys_mmhg_mean} ({bp_sys_mmhg_sd})", rspec=rspec),
    dbp = table_glue("{bp_dia_mmhg_mean} ({bp_dia_mmhg_sd})", rspec=rspec)
  )

Making the table object

Making a table out of nhanes_tbl_strings is fairly straightforward, but does require a fair bit of code to recode variables and label things.


nhanes_bp_table <- nhanes_bp_strings %>% 
  mutate(
    race_ethnicity = recode(
      race_ethnicity,
      'hispanic' = "Mexican American",
      'asian' = "Non-Hispanic Asian",
      'black' = "Non-Hispanic Black",
      'white' = "Non-Hispanic White",
      'other' = "Other Race - Including Multi-Racial"
    ),
    sex = recode(
      sex, 
      'female' = 'Female survey participants', 
      'male' = 'Male survey participants'
    ),
    exam = recode(
      exam,
      'exam_2013_2014' = '2013-2014',
      'exam_2015_2016' = '2015-2016',
      'exam_2017_2018' = '2017-2018'
    )
  ) %>%
  pivot_wider(names_from = 'exam',
              values_from = c('sbp', 'dbp')) %>% 
  gt(rowname_col = 'race_ethnicity', 
     groupname_col = 'sex') %>% 
  cols_label(
    "sbp_2013-2014" = "2013-2014",
    "sbp_2015-2016" = "2015-2016",
    "sbp_2017-2018" = "2017-2018",
    "dbp_2013-2014" = "2013-2014",
    "dbp_2015-2016" = "2015-2016",
    "dbp_2017-2018" = "2017-2018"
  ) %>% 
  cols_align('center') %>% 
  tab_stubhead(label = 'Race / ethnicity') %>% 
  tab_spanner(label = 'Systolic BP, mm Hg',
              columns = c("sbp_2013-2014",
                          "sbp_2015-2016",
                          "sbp_2017-2018")) %>% 
  tab_spanner(label = 'Diastolic BP, mm Hg',
              columns = c("dbp_2013-2014",
                          "dbp_2015-2016",
                          "dbp_2017-2018")) %>% 
  tab_header(
    title = "Systolic and diastolic blood pressure for female and male participants in the US National Health and Nutrition Examination Survey from 2013 - 2018", 
    subtitle = "Table values are mean (standard deviation)"
  )

nhanes_bp_table
Systolic and diastolic blood pressure for female and male participants in the US National Health and Nutrition Examination Survey from 2013 - 2018
Table values are mean (standard deviation)
Race / ethnicity Systolic BP, mm Hg Diastolic BP, mm Hg
2013-2014 2015-2016 2017-2018 2013-2014 2015-2016 2017-2018
Female survey participants
Non-Hispanic Asian 134 (16.1) 137 (17.9) 139 (19.0) 73.1 (10.7) 72.3 (12.2) 74.4 (10.3)
Non-Hispanic Black 137 (19.8) 136 (19.1) 141 (22.0) 70.1 (13.3) 71.0 (15.1) 73.9 (12.7)
Mexican American 134 (23.1) 133 (16.8) 142 (19.7) 69.9 (13.3) 67.7 (10.8) 72.4 (11.5)
Other Race - Including Multi-Racial 133 (21.2) 137 (21.9) 139 (21.2) 70.7 (11.5) 67.3 (12.3) 71.3 (13.6)
Non-Hispanic White 131 (19.5) 135 (21.8) 137 (21.3) 67.1 (11.8) 67.9 (11.7) 69.1 (15.0)
Male survey participants
Non-Hispanic Asian 134 (14.7) 137 (20.4) 133 (21.4) 73.2 (11.3) 71.9 (13.3) 74.3 (13.1)
Non-Hispanic Black 133 (19.0) 139 (21.0) 137 (20.2) 72.6 (12.8) 74.1 (13.3) 77.6 (12.5)
Mexican American 133 (17.2) 136 (16.8) 134 (19.9) 71.7 (12.1) 68.9 (12.5) 73.2 (11.2)
Other Race - Including Multi-Racial 132 (15.9) 133 (15.5) 133 (19.1) 72.6 (11.6) 71.3 (13.2) 76.9 (12.5)
Non-Hispanic White 129 (20.2) 132 (19.1) 133 (18.0) 69.6 (13.3) 68.6 (12.0) 71.4 (11.7)

Making the inline object

Now we just need to summarize the table, which can be done using an inline table object. Just like the table, our inline object is created using nhanes_bp_strings


nhanes_bp_inline <- nhanes_bp_strings %>% 
  as_inline(tbl_variables = c('exam', 'sex', 'race_ethnicity'),
            tbl_value = c('sbp', 'dbp'))

# Now you can freely access any cell in the table
nhanes_bp_inline$exam_2013_2014$male$black$sbp
#> [1] "133 (19.0)"
nhanes_bp_inline$exam_2017_2018$male$black$sbp 
#> [1] "137 (20.2)"

Using the inline object

This part is easy: “Among survey participants who were black and female, the mean (standard deviation) of systolic blood pressure increased from 137 (19.8) in 2013-2014 to 141 (22.0) in 2017-2018.”

But what if you don’t want to report the standard deviations? These are helpful in the table but less so in the text. To get rid of these, you can use the bracket helper functions, e.g., bracket_drop() and bracket_extract().


# Drop the bracket
bracket_drop(nhanes_bp_inline$exam_2013_2014$male$black$sbp)
#> [1] "133"
# Get the bracket
bracket_extract(nhanes_bp_inline$exam_2017_2018$male$black$sbp)
#> [1] "(20.2)"

Let’s re-write our sentence more directly: “Among survey participants who were black and female, the mean systolic blood pressure increased from 137 mm Hg in 2013-2014 to 141 mm Hg in 2017-2018.”

Summary

This case study has covered three big ideas:

  1. Making summary data

  2. Making summary strings

  3. Making the table and inline object

The workflow covered here is designed to fit almost any table and work with any package that creates tables. Happy tabulating!