This vignette is meant for those who wish to contribute to {gtsummary}, or users who wish to gain an understanding of the inner-workings of a {gtsummary} object so they may more easily modify them to suit your own needs. If this does not describe you, please refer to the {gtsummary} website to an introduction on how to use the package’s functions and tutorials on advanced use.
Every {gtsummary} table has a few characteristics common among all tables created with the package. Here, we review those characteristics, and provide instructions on how to construct a {gtsummary} object.
library(gtsummary)
<-
tbl_regression_ex lm(age ~ grade + marker, trial) %>%
tbl_regression() %>%
bold_p(t = 0.5)
<-
tbl_summary_ex %>%
trial select(trt, age, grade, response) %>%
tbl_summary(by = trt)
Every {gtsummary} object is a list comprising of, at minimum, these elements:
$table_body .$table_styling .
The .$table_body
object is the data frame that will ultimately be printed as the output. The table must include columns "label"
, "row_type"
, and "variable"
. The "label"
column is printed, and the other two are hidden from the final output.
$table_body
tbl_summary_ex#> # A tibble: 8 x 7
#> variable var_type var_label row_type label stat_1 stat_2
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 age continuous Age label Age 46 (37, 59) 48 (3~
#> 2 age continuous Age missing Unknown 7 4
#> 3 grade categorical Grade label Grade <NA> <NA>
#> 4 grade categorical Grade level I 35 (36%) 33 (3~
#> 5 grade categorical Grade level II 32 (33%) 36 (3~
#> 6 grade categorical Grade level III 31 (32%) 33 (3~
#> 7 response dichotomous Tumor Response label Tumor Response 28 (29%) 33 (3~
#> 8 response dichotomous Tumor Response missing Unknown 3 4
The .$table_styling
object is a list of data frames containing information about how .$table_body
is printed, formatted, and styled.
The list contains the following data frames header
, footnote
, footnote_abbrev
, fmt_fun
, text_format
, fmt_missing
, cols_merge
and the following objects source_note
, caption
, horizontal_line_above
.
header
The header
table has the following columns and is one row per column found in .$table_body
. The table contains styling information that applies to entire column or the columns headers.
Column | Description |
---|---|
column |
Column name from |
hide |
Logical indicating whether the column is hidden in the output |
align |
Specifies the alignment/justification of the column, e.g. 'center' or 'left' |
label |
Label that will be displayed (if column is displayed in output) |
interpret_label |
the {gt} function that is used to interpret the column label, |
spanning_header |
Includes text printed above columns as spanning headers. |
interpret_spanning_header |
the {gt} function that is used to interpret the column spanning headers, |
footnote
& footnote_abbrev
Each {gtsummary} table may contain a single footnote per header and cell within the table. Footnotes and footnote abbreviations are handled separately. Updates/changes to footnote are appended to the bottom of the tibble. A footnote of NA_character_
deletes an existing footnote.
Column | Description |
---|---|
column |
Column name from |
rows |
expression selecting rows in |
footnote |
string containing footnote to add to column/row |
fmt_fun
Numeric columns/rows are styled with the functions stored in fmt_fun
. Updates/changes to styling functions are appended to the bottom of the tibble.
Column | Description |
---|---|
column |
Column name from |
rows |
expression selecting rows in |
fmt_fun |
list of formatting/styling functions |
text_format
Columns/rows are styled with bold, italic, or indenting stored in text_format
. Updates/changes to styling functions are appended to the bottom of the tibble.
Column | Description |
---|---|
column |
Column name from |
rows |
expression selecting rows in |
format_type |
one of |
undo_text_format |
logical indicating where the formatting indicated should be undone/removed. |
fmt_missing
By default, all NA
values are shown blanks. Missing values in columns/rows are replaced with the symbol
. For example, reference rows in tbl_regression()
are shown with an em-dash. Updates/changes to styling functions are appended to the bottom of the tibble.
Column | Description |
---|---|
column |
Column name from |
rows |
expression selecting rows in |
symbol |
string to replace missing values with, e.g. an em-dash |
cols_merge
This object is experimental and may change in the future. This tibble gives instructions for merging columns into a single column. The implementation in as_gt()
will be updated after gt::cols_label()
gains a rows=
argument.
Column | Description |
---|---|
column |
Column name from |
rows |
expression selecting rows in |
pattern |
glue pattern directing how to combine/merge columns. The merged columns will replace the column indicated in 'column'. |
source_note
String that is made a table source note. The attribute "text_interpret"
is either c("md", "html")
.
caption
String that is made into the table caption. The attribute "text_interpret"
is either c("md", "html")
.
horizontal_line_above
Expression identifying a row where a horizontal line is placed above in the table.
Example from tbl_regression()
$table_styling
tbl_regression_ex#> $header
#> # A tibble: 24 x 7
#> column hide align interpret_label label interpret_spann~ spanning_header
#> <chr> <lgl> <chr> <chr> <chr> <chr> <chr>
#> 1 variable TRUE cent~ gt::md vari~ gt::md <NA>
#> 2 var_label TRUE cent~ gt::md var_~ gt::md <NA>
#> 3 var_type TRUE cent~ gt::md var_~ gt::md <NA>
#> 4 reference~ TRUE cent~ gt::md refe~ gt::md <NA>
#> 5 row_type TRUE cent~ gt::md row_~ gt::md <NA>
#> 6 header_row TRUE cent~ gt::md head~ gt::md <NA>
#> 7 N_obs TRUE cent~ gt::md N_obs gt::md <NA>
#> 8 N TRUE cent~ gt::md **N** gt::md <NA>
#> 9 coefficie~ TRUE cent~ gt::md coef~ gt::md <NA>
#> 10 coefficie~ TRUE cent~ gt::md coef~ gt::md <NA>
#> # ... with 14 more rows
#>
#> $footnote
#> # A tibble: 0 x 4
#> # ... with 4 variables: column <chr>, rows <list>, text_interpret <chr>,
#> # footnote <chr>
#>
#> $footnote_abbrev
#> # A tibble: 2 x 4
#> column rows text_interpret footnote
#> <chr> <list> <chr> <chr>
#> 1 ci <quosure> gt::md CI = Confidence Interval
#> 2 std.error <quosure> gt::md SE = Standard Error
#>
#> $text_format
#> # A tibble: 2 x 4
#> column rows format_type undo_text_format
#> <chr> <list> <chr> <lgl>
#> 1 label <language> indent FALSE
#> 2 p.value <quosure> bold FALSE
#>
#> $fmt_missing
#> # A tibble: 4 x 3
#> column rows symbol
#> <chr> <list> <chr>
#> 1 estimate <quosure> —
#> 2 ci <quosure> —
#> 3 std.error <quosure> —
#> 4 statistic <quosure> —
#>
#> $fmt_fun
#> # A tibble: 10 x 3
#> column rows fmt_fun
#> <chr> <list> <list>
#> 1 estimate <quosure> <fn>
#> 2 N <quosure> <fn>
#> 3 N_obs <quosure> <fn>
#> 4 n_obs <quosure> <fn>
#> 5 conf.low <quosure> <fn>
#> 6 conf.high <quosure> <fn>
#> 7 p.value <quosure> <fn>
#> 8 std.error <quosure> <prrr_fn_>
#> 9 statistic <quosure> <prrr_fn_>
#> 10 var_nlevels <quosure> <prrr_fn_>
#>
#> $cols_merge
#> # A tibble: 0 x 3
#> # ... with 3 variables: column <chr>, rows <list>, pattern <chr>
When constructing a {gtsummary} object, the author will begin with the .$table_body
object. Recall the .$table_body
data frame must include columns "label"
, "row_type"
, and "variable"
. Of these columns, only the "label"
column will be printed with the final results. The "row_type"
column typically will control whether or not the label column is indented. The "variable"
column is often used in the inline_text()
family of functions, and merging {gtsummary} tables with tbl_merge()
.
%>%
tbl_regression_ex ::pluck("table_body") %>%
purrrselect(variable, row_type, label)
#> # A tibble: 5 x 3
#> variable row_type label
#> <chr> <chr> <chr>
#> 1 grade label Grade
#> 2 grade level I
#> 3 grade level II
#> 4 grade level III
#> 5 marker label Marker Level (ng/mL)
The other columns in .$table_body
are created by the user and are likely printed in the output. Formatting and printing instructions for these columns is stored in .$table_styling
.
There are a few internal {gtsummary} functions to assist in constructing and modifying a .$table_header
data frame.
.create_gtsummary_object(table_body)
After a user creates a table_body
, pass it to this function and the skeleton of a gtsummary object is created and returned (including the full table_styling
list of tables).
.update_table_styling()
After columns are added or removed from table_body
, run this function to update .$table_styling
to include or remove styling instructions for the columns. FYI the default styling for each new column is to hide it.
modify_table_styling()
This exported function modifies the printing instructions for a single column or groups of columns.
modify_table_body()
This exported function helps users make changes to .$table_body
. The function runs .update_table_styling()
internally to maintain internal validity with the printing instructions.
All {gtsummary} objects are printed with print.gtsummary()
. Before a {gtsummary} object is printed, it is converted to a {gt} object using as_gt()
. This function takes the {gtsummary} object as its input, and uses the information in .$table_styling
to construct a list of {gt} calls that will be executed on .$table_body
. After the {gtsummary} object is converted to {gt}, it is then printed as any other {gt} object.
In some cases, the package defaults to printing with other engines, such as flextable (as_flex_table()
), huxtable (as_hux_table()
), kableExtra (as_kable_extra()
), and kable (as_kable()
). The default print engine is set with the theme element "pkgwide-str:print_engine"
While the actual print function is slightly more involved, it is basically this:
<- function(x) {
print.gtsummary get_theme_element("pkgwide-str:print_engine") %>%
switch(
"gt" = as_gt(x),
"flextable" = as_flex_table(x),
"huxtable" = as_hux_table(x),
"kable_extra" = as_kable_extra(x),
"kable" = as_kable(x)
%>%
) print()
}
.$meta_data$df_stats
tibbleSome {gtsummary} tables contain an internal object called .$meta_data
containing a list column called "df_stats"
. The column is a list of tibbles with each tibble containing the summary statistics presented in the final gtsummary table. While the statistics contained in each "df_stats"
tibble can vary within a single gtsummary object, all the tibbles have a few common characteristics.
Each tibble contain the following columns
Column | Description |
---|---|
|
String of the variable name |
|
String matching the variable's values in |
|
The column name the statistics appear under in |
|
This column appears if and only if the variable being summarized has multiple levels. The column is equal to the variable's levels. |
|
Primarily, the tibble stores the summary statistics for each variable. For example, when the mean is requested in |
The statistics columns each have an attribute called "fmt_fun"
containing the formatting function that will be applied before the statistic is placed in .$table_body
.