Multiple Imputation and Bootstrapping - Method boot_MI

Martijn W Heymans

2021-09-23

Introduction

This page contains information of the boot_MI method that is implemented in the psfmi_validate function of the psfmi package.

Method boot_MI

The method follows the internal validation procedure of the validate function in the rms package for complete data but now within the context of multiply imputed data. With the method boot_MI, first bootstrap samples are drawn from the original incomplete dataset and than multiple imputation is applied in each of these incomplete bootstrap samples. The pooled model is analyzed in each bootstrap sample (training data) and subsequently tested in the original multiply imputed data to determine the amount of optimism. The method can be performed in combination with backward or forward selection.

How these steps work is visualized in the Figure below.

Schematic overview of the boot_MI method

Schematic overview of the boot_MI method

Examples

Method boot_MI

internal validation is done of the last model that is selected by the function psfmi_lr. In the example below, psfmi_lr is used with p.crit set at 1. This setting is also used in the psfmi_validate function. This means that first the full model is pooled and subsequently interval validation is also done of this full model.

library(psfmi)
pool_lr <- psfmi_lr(data=lbpmilr, formula = Chronic ~ Pain + JobDemands + rcs(Tampascale, 3) +
                   factor(Satisfaction) + Smoking, p.crit = 1, direction="FW",
                 nimp=5, impvar="Impnr", method="D1")

set.seed(100)
res_MI_boot <- psfmi_validate(pool_lr, val_method = "boot_MI", data_orig = lbp_orig, nboot = 5,
                     p.crit=1, nimp_mice = 3, direction = "BW", miceImp = miceImp,
                     printFlag = FALSE)
## 
## Boot 1
## 
## Boot 2
## 
## Boot 3
## 
## Boot 4
## 
## Boot 5
## 
## p.crit = 1, validation is done without variable selection
res_MI_boot
## $stats_val
##                   Orig  Apparent      Test   Optimism Corrected
## AUC          0.8871000 0.8916400 0.8802000 0.01144000 0.8756600
## R2           0.5605521 0.5783181 0.5378758 0.04044230 0.5201098
## Brier Scaled 0.4514569 0.4629364 0.4117878 0.05114854 0.4003084
## Slope        1.0000000 1.0000000 0.9050770 0.09492303 0.9050770
## 
## $intercept_test
##  intercept 
## -0.1418631 
## 
## $res_boot
##        ROC_app ROC_test    R2_app   R2_test Brier_sc_app Brier_sc_test
## Boot 1  0.8431   0.8777 0.4443306 0.5289039    0.3356881     0.4126126
## Boot 2  0.9131   0.8817 0.6384560 0.5483116    0.5150413     0.4217096
## Boot 3  0.8911   0.8822 0.5923924 0.5458272    0.4735383     0.4225408
## Boot 4  0.9104   0.8864 0.6219550 0.5501550    0.5026123     0.4078803
## Boot 5  0.9005   0.8730 0.5944567 0.5161814    0.4878018     0.3941958
##          intercept     Slope
## Boot 1 -0.11704446 1.2052458
## Boot 2 -0.18244542 0.7159558
## Boot 3 -0.05277384 0.8674005
## Boot 4 -0.40024757 0.8972059
## Boot 5  0.04319589 0.8395768
## 
## $predictors_selected
##        Pain JobDemands Smoking factor(Satisfaction) rcs(Tampascale,3)
## Boot 1    1          1       1                    1                 1
## Boot 2    1          1       1                    1                 1
## Boot 3    1          1       1                    1                 1
## Boot 4    1          1       1                    1                 1
## Boot 5    1          1       1                    1                 1
## 
## $model_orig
## Chronic ~ Pain + JobDemands + Smoking + factor(Satisfaction) + 
##     rcs(Tampascale, 3)
## <environment: 0x00000000205b0d40>

Back to Examples

Method boot_MI including BW selection

Internal validation is done of the last model that is selected by the function psfmi_lr. In the example below, psfmi_lr is used with p.crit set at 1, and pooling is than done of the full model. Then interval validation is done with the psfmi_validate function including BW selection by setting p.crit=0.05. BW selection is than applied in each bootstrap sample from the full model of pool_lr. In this way, shrinkage of models can be performed including backward selection of variables. In this way a fair shrinkage factor can be determined because variable selection is responsible for a large amount of overfitting in coefficients.

library(psfmi)
pool_lr <- psfmi_lr(data=lbpmilr, Outcome="Chronic", predictors = c("Pain", "JobDemands", "Smoking"), 
                   cat.predictors = "Satisfaction", spline.predictors = "Tampascale", nknots=3,
                   p.crit = 1, direction="FW", nimp=5, impvar="Impnr", method="D1")

set.seed(100)
res_MI_boot <- psfmi_validate(pool_lr, val_method = "boot_MI", data_orig = lbp_orig, nboot = 5,
                     p.crit=0.05, nimp_mice = 3, direction = "BW", miceImp = miceImp,
                     printFlag = FALSE)
## 
## Boot 1
## Removed at Step 1 is - JobDemands
## Removed at Step 2 is - Smoking
## Removed at Step 3 is - rcs(Tampascale,3)
## 
## Selection correctly terminated, 
## No more variables removed from the model
## 
## Boot 2
## Removed at Step 1 is - JobDemands
## Removed at Step 2 is - Smoking
## Removed at Step 3 is - rcs(Tampascale,3)
## 
## Selection correctly terminated, 
## No more variables removed from the model
## 
## Boot 3
## Removed at Step 1 is - JobDemands
## Removed at Step 2 is - Smoking
## Removed at Step 3 is - Pain
## 
## Selection correctly terminated, 
## No more variables removed from the model
## 
## Boot 4
## Removed at Step 1 is - Smoking
## Removed at Step 2 is - JobDemands
## Removed at Step 3 is - rcs(Tampascale,3)
## 
## Selection correctly terminated, 
## No more variables removed from the model
## 
## Boot 5
## Removed at Step 1 is - JobDemands
## Removed at Step 2 is - Smoking
## 
## Selection correctly terminated, 
## No more variables removed from the model
## Removed at Step 1 is - Smoking
## Removed at Step 2 is - JobDemands
## Removed at Step 3 is - rcs(Tampascale,3)
## 
## Selection correctly terminated, 
## No more variables removed from the model
res_MI_boot
## $stats_val
##                   Orig  Apparent      Test   Optimism Corrected
## AUC          0.8730000 0.8750000 0.8704400 0.00456000 0.8684400
## R2           0.5244014 0.5353315 0.5160330 0.01929853 0.5051029
## Brier Scaled 0.4384749 0.4285873 0.4076258 0.02096155 0.4175134
## Slope        1.0000000 1.0000000 0.9649318 0.03506820 0.9649318
## 
## $intercept_test
##  intercept 
## -0.0995169 
## 
## $res_boot
##        ROC_app ROC_test    R2_app   R2_test Brier_sc_app Brier_sc_test
## Boot 1  0.8230   0.8669 0.4070467 0.5121114    0.3136382     0.4027844
## Boot 2  0.9034   0.8722 0.6093501 0.5213792    0.4866593     0.4297398
## Boot 3  0.8684   0.8568 0.5243855 0.4775389    0.4131725     0.3481191
## Boot 4  0.8933   0.8730 0.5795163 0.5227525    0.4873638     0.4282208
## Boot 5  0.8869   0.8833 0.5563589 0.5463829    0.4421028     0.4292647
##          intercept     Slope
## Boot 1 -0.14270301 1.2315078
## Boot 2 -0.01465516 0.7948917
## Boot 3 -0.19316264 0.8346061
## Boot 4 -0.19130914 0.9528790
## Boot 5  0.04424543 1.0107745
## 
## $predictors_selected
##        Pain JobDemands Smoking factor(Satisfaction) rcs(Tampascale,3)
## Boot 1    1          0       0                    1                 0
## Boot 2    1          0       0                    1                 0
## Boot 3    0          0       0                    1                 1
## Boot 4    1          0       0                    1                 0
## Boot 5    1          0       0                    1                 1
## 
## $model_orig
## Chronic ~ Pain + factor(Satisfaction)
## <environment: 0x0000000025e11218>

Back to Examples