First, set up your working environment by loading these 4 packages using the library()
function. Note that you can copy any of the code blocks below by hovering over the top right corner, clicking on the copy icon that appears, and them paste these into your local version of R to run them.
It is common to count data, particularly in categories, to summarize characteristics or outcomes. The tabyl function in the {janitor} package is helpful for this.
In this vignette, we will look at how to use this function to make simple tables of counts of your data.
First, we will read in the data from strep_tb and indo_rct to use for our tables. Load the libraries in the setup chunk above. If you then run the code chunk below, you should have two new data objects in your Environment tab.
We will now try to reproduce Table 2 from the Streptomycin for Tuberculosis manuscript, which can be found here on page 771. These are the primary endpoint results, summarized in a 6 rows x 2 columns table (with an added totals row). Let’s walk through how to do this with the tabyl() function in the {janitor} package.
The tabyl() function allows you to pipe data into tables, and add ‘adornments’ like total rows and percentages. Let’s start with a basic one-variable tabyl using this ordinal endpoint. You can pipe the dataset into the tabyl function, with your desired variable (radiologic_6m) as the only argument to the function.
strep_tb %>%
tabyl(radiologic_6m)
#> radiologic_6m n percent
#> 6_Considerable_improvement 32 0.29906542
#> 5_Moderate_improvement 23 0.21495327
#> 4_No_change 5 0.04672897
#> 3_Moderate_deterioration 17 0.15887850
#> 2_Considerable_deterioration 12 0.11214953
#> 1_Death 18 0.16822430
This gives us the n and proportion of each level of the primary outcome.
If we add the treatment arm variable as a 2nd argument, we can come closer to the original table. Note that the levels of the first argument make up the rows of the table, and that the levels of the 2nd argument make up the columns (standard R x C order). Also note that with 2 variables, we get the counts by default, but not proportions of each level, as you might want proportions that are column-wise, or row-wise.
This is closer, but lacks the total row, and the percentages. In order to have numbers to calculate totals, we have to start with the total row first. We will ‘adorn’ the table with a totals row. We have to specify that we want an additional row of totals at the bottom (not a column of row-wise totals in a new column on the right), with the where
argument to the adorn_totals function.
strep_tb %>%
tabyl(radiologic_6m, arm) %>%
adorn_totals(where = "row") # add a total row
#> radiologic_6m Streptomycin Control
#> 6_Considerable_improvement 28 4
#> 5_Moderate_improvement 10 13
#> 4_No_change 2 3
#> 3_Moderate_deterioration 5 12
#> 2_Considerable_deterioration 6 6
#> 1_Death 4 14
#> Total 55 52
This is closer. Now we need to add the percentages, and percentage formatting. We need to specify that we want column-wise, rather than row-wise percentages. We then have to adorn_ns to add the counts, so that we have both counts and percentages. We can specify that we want the counts to be listed first, with the argument position = "front"
in the adorn_ns function.
strep_tb %>%
tabyl(radiologic_6m, arm) %>% #2 dimensional table, RxC
adorn_totals(where = "row") %>% # add totals row
adorn_percentages("col") %>% # column-wise percentages
adorn_pct_formatting() %>%
adorn_ns(position = "front") # put n first
#> radiologic_6m Streptomycin Control
#> 6_Considerable_improvement 28 (50.9%) 4 (7.7%)
#> 5_Moderate_improvement 10 (18.2%) 13 (25.0%)
#> 4_No_change 2 (3.6%) 3 (5.8%)
#> 3_Moderate_deterioration 5 (9.1%) 12 (23.1%)
#> 2_Considerable_deterioration 6 (10.9%) 6 (11.5%)
#> 1_Death 4 (7.3%) 14 (26.9%)
#> Total 55 (100.0%) 52 (100.0%)
You can pipe this table into a flextable() object, which makes it easy to add fancy formatting. There are many formatting options in the flextable package, which you can learn about here. You can control column width, fonts, colors, and much more once you are in flextable format. Flextables can be output to MS Word, powerpoint, HTML, and PDF, through Rmarkdown.
strep_tb %>%
tabyl(radiologic_6m, arm) %>%
adorn_totals(where = "row") %>%
adorn_percentages("col") %>% # column-wise percentages
adorn_pct_formatting() %>%
adorn_ns(position = "front") %>% # put n first
flextable::flextable()
radiologic_6m |
Streptomycin |
Control |
6_Considerable_improvement |
28 (50.9%) |
4 (7.7%) |
5_Moderate_improvement |
10 (18.2%) |
13 (25.0%) |
4_No_change |
2 (3.6%) |
3 (5.8%) |
3_Moderate_deterioration |
5 (9.1%) |
12 (23.1%) |
2_Considerable_deterioration |
6 (10.9%) |
6 (11.5%) |
1_Death |
4 (7.3%) |
14 (26.9%) |
Total |
55 (100.0%) |
52 (100.0%) |
Now try this yourself, but instead of using the ordinal radiologic_6m outcome, use the improved dichotomous outcome in its place. Copy the code block below and add the piping and additional lines to produce a 2 x 2 table of outcomes. You can websearch for janitor tabyl adorn_title
to learn how to add a title to your table.
strep_tb
#> # A tibble: 107 × 13
#> patient_id arm dose_strep_g dose_PAS_g gender baseline_condition
#> <chr> <fct> <dbl> <dbl> <fct> <fct>
#> 1 0001 Control 0 0 M 1_Good
#> 2 0002 Control 0 0 F 1_Good
#> 3 0003 Control 0 0 F 1_Good
#> 4 0004 Control 0 0 M 1_Good
#> 5 0005 Control 0 0 F 1_Good
#> 6 0006 Control 0 0 M 1_Good
#> 7 0007 Control 0 0 F 1_Good
#> 8 0008 Control 0 0 M 1_Good
#> 9 0009 Control 0 0 F 2_Fair
#> 10 0010 Control 0 0 M 2_Fair
#> # … with 97 more rows, and 7 more variables: baseline_temp <fct>,
#> # baseline_esr <fct>, baseline_cavitation <fct>, strep_resistance <fct>,
#> # radiologic_6m <fct>, rad_num <dbl>, improved <lgl>
Now try to do this with the indo_rct dataset, using the treatment variable group
and the outcome variable of outcome
. Add a total row, percentages, and a title.
indo_rct
#> # A tibble: 602 × 33
#> id site age risk gender outcome sod pep recpanc psphinc precut
#> <dbl> <fct> <dbl> <dbl> <fct> <fct> <fct> <fct> <fct> <fct> <fct>
#> 1 1001 1_UM 26 2 1_female 1_yes 1_yes 0_no 1_yes 0_no 0_no
#> 2 1002 1_UM 24 1 2_male 0_no 0_no 1_yes 0_no 0_no 0_no
#> 3 1003 1_UM 57 1 1_female 0_no 1_yes 0_no 0_no 0_no 0_no
#> 4 1004 1_UM 29 2 1_female 1_yes 1_yes 0_no 0_no 0_no 0_no
#> 5 1005 1_UM 38 3.5 1_female 0_no 1_yes 1_yes 0_no 1_yes 0_no
#> 6 1006 1_UM 59 3 1_female 0_no 1_yes 0_no 0_no 0_no 1_yes
#> 7 1007 1_UM 60 1.5 1_female 0_no 0_no 0_no 1_yes 0_no 0_no
#> 8 1008 1_UM 29 1 2_male 0_no 0_no 0_no 0_no 0_no 0_no
#> 9 1009 1_UM 53 2 2_male 0_no 0_no 0_no 1_yes 0_no 0_no
#> 10 1010 1_UM 20 2 2_male 0_no 0_no 0_no 0_no 0_no 1_yes
#> # … with 592 more rows, and 22 more variables: difcan <fct>, pneudil <fct>,
#> # amp <fct>, paninj <fct>, acinar <fct>, brush <fct>, asa81 <fct>,
#> # asa325 <fct>, asa <fct>, prophystent <fct>, therastent <fct>,
#> # pdstent <fct>, sodsom <fct>, bsphinc <fct>, bstent <fct>, chole <fct>,
#> # pbmal <fct>, train <fct>, status <fct>, type <fct>, rx <fct>, bleed <dbl>