-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathmissing_values.Rmd
140 lines (119 loc) · 3.53 KB
/
missing_values.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
---
title: "Missing Values in Tern"
date: "2022-04-12"
output:
rmarkdown::html_document:
theme: "spacelab"
highlight: "kate"
toc: true
toc_float: true
vignette: >
%\VignetteIndexEntry{Missing Values in Tern}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
editor_options:
markdown:
wrap: 72
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
The packages used in this vignette are:
```{r, message=FALSE}
library(rtables)
library(formatters)
library(tern)
library(dplyr)
```
## Variable Class Conversion
`rtables` requires that split variables to be factors. When you try and split a variable that
isn't, a warning message will appear. Here we purposefully convert the SEX variable to character
to demonstrate what happens when we try splitting the rows by this variable. To fix this,
`df_explict_na` will convert this to a factor resulting in the table being generated.
```{r}
adsl <- tern_ex_adsl
adsl$SEX <- as.factor(adsl$SEX)
vars <- c("AGE", "SEX", "RACE", "BMRKR1")
var_labels <- c(
"Age (yr)",
"Sex",
"Race",
"Continous Level Biomarker 1"
)
result <- basic_table(show_colcounts = TRUE) %>%
split_cols_by(var = "ARM") %>%
add_overall_col("All Patients") %>%
analyze_vars(
vars = vars,
var_labels = var_labels
) %>%
build_table(adsl)
result
```
## Including Missing Values in `rtables`
Here we purposefully convert all `M` values to `NA` in the `SEX` variable.
After running `df_explicit_na` the `NA` values are encoded as `<Missing>` but they are not
included in the table. As well, the missing values are not included in the `n` count and they
are not included in the denominator value for calculating the percent values.
```{r}
adsl <- tern_ex_adsl
adsl$SEX[adsl$SEX == "M"] <- NA
adsl <- df_explicit_na(adsl)
vars <- c("AGE", "SEX")
var_labels <- c(
"Age (yr)",
"Sex"
)
result <- basic_table(show_colcounts = TRUE) %>%
split_cols_by(var = "ARM") %>%
add_overall_col("All Patients") %>%
analyze_vars(
vars = vars,
var_labels = var_labels
) %>%
build_table(adsl)
result
```
If you want the `Na` values to be displayed in the table and included in the `n` count
and as the denominator for calculating percent values, use the `na_level` argument.
```{r}
adsl <- tern_ex_adsl
adsl$SEX[adsl$SEX == "M"] <- NA
adsl <- df_explicit_na(adsl, na_level = "Missing Values")
result <- basic_table(show_colcounts = TRUE) %>%
split_cols_by(var = "ARM") %>%
add_overall_col("All Patients") %>%
analyze_vars(
vars = vars,
var_labels = var_labels
) %>%
build_table(adsl)
result
```
## Missing Values in Numeric Variables
Numeric variables that have missing values are not altered. This means that any `NA` value in
a numeric variable will not be included in the summary statistics, nor will they be included
in the denominator value for calculating the percent values. Here we make any value less than 30
missing in the `AGE` variable and only the valued greater than 30 are included in the table below.
```{r}
adsl <- tern_ex_adsl
adsl$AGE[adsl$AGE < 30] <- NA
adsl <- df_explicit_na(adsl)
vars <- c("AGE", "SEX")
var_labels <- c(
"Age (yr)",
"Sex"
)
result <- basic_table(show_colcounts = TRUE) %>%
split_cols_by(var = "ARM") %>%
add_overall_col("All Patients") %>%
analyze_vars(
vars = vars,
var_labels = var_labels
) %>%
build_table(adsl)
result
```