Multiple Imputation and Cross-validation - The MI_cv_naive method

Martijn W Heymans

2021-09-23

Introduction

This page contains information of the MI_cv_naive method that is implemented in the psfmi package and that combines Multiple Imputation with Cross-validation for the validation of logistic regression / prediction models. The MI_cv_naive method is implemented in the function psfmi_validate. An explanation and examples of how to use the method can be found below.

Method MI_cv_naive

This method applies cross-validation after Multiple Imputation. The same folds are used in each multiply imputed dataset. Is is possible to do backward selection during cross-validation. How this method works is visualized in the Figure below.

Schematic overview of the MI_cv_naive method

Schematic overview of the MI_cv_naive method

Examples

Method MI_cv_naive

To run the MI_cv_naive method use:

library(psfmi)
pool_lr <- psfmi_lr(data=lbpmilr, formula = Chronic ~ Pain + JobDemands + rcs(Tampascale, 3) +
                   factor(Satisfaction) + Smoking, p.crit = 1, direction="BW",
                   nimp=5, impvar="Impnr", method="D1")

set.seed(100)
res_cv <- psfmi_validate(pool_lr, val_method = "MI_cv_naive", folds = 5,
                     p.crit=1, BW=FALSE)
## 
## Imputation 1
## 
## Imputation 2
## 
## Imputation 3
## 
## Imputation 4
## 
## Imputation 5
res_cv
## $cv_stats
##                  Train      Test
## AUC          0.8913819 0.8514000
## Brier scaled 0.4594704 0.3574320
## R-squared    0.5686534 0.5004528
## 
## $auc_test
##                     95% Low C-statistic 95% Up
## C-statistic (logit)  0.7672      0.8514 0.9088
## 
## $test_coef
##   Intercept       Slope 
## -0.08777285  0.83748525

Back to Examples

Method MI_cv_naive including BW selection

To run the MI_cv_naive method by implementing backward variable selection during cross-validation use:

library(psfmi)
pool_lr <- psfmi_lr(data=lbpmilr, formula = Chronic ~ Pain + JobDemands + rcs(Tampascale, 3) +
                   factor(Satisfaction) + Smoking, p.crit = 1, direction="BW",
                   nimp=5, impvar="Impnr", method="D1")

set.seed(100)
res_cv <- psfmi_validate(pool_lr, val_method = "MI_cv_naive", folds = 5,
                     p.crit=0.05, BW=TRUE)
## 
## Imputation 1
## Removed at Step 1 is - Smoking
## Removed at Step 2 is - JobDemands
## Removed at Step 3 is - rcs(Tampascale,3)
## 
## Selection correctly terminated, 
## No more variables removed from the model
## Removed at Step 1 is - JobDemands
## Removed at Step 2 is - Smoking
## Removed at Step 3 is - rcs(Tampascale,3)
## 
## Selection correctly terminated, 
## No more variables removed from the model
## Removed at Step 1 is - JobDemands
## Removed at Step 2 is - Smoking
## Removed at Step 3 is - rcs(Tampascale,3)
## 
## Selection correctly terminated, 
## No more variables removed from the model
## Removed at Step 1 is - Smoking
## Removed at Step 2 is - JobDemands
## 
## Selection correctly terminated, 
## No more variables removed from the model
## Removed at Step 1 is - JobDemands
## Removed at Step 2 is - Smoking
## Removed at Step 3 is - rcs(Tampascale,3)
## 
## Selection correctly terminated, 
## No more variables removed from the model
## 
## Imputation 2
## Removed at Step 1 is - Smoking
## Removed at Step 2 is - JobDemands
## Removed at Step 3 is - rcs(Tampascale,3)
## 
## Selection correctly terminated, 
## No more variables removed from the model
## Removed at Step 1 is - JobDemands
## Removed at Step 2 is - Smoking
## Removed at Step 3 is - rcs(Tampascale,3)
## 
## Selection correctly terminated, 
## No more variables removed from the model
## Removed at Step 1 is - JobDemands
## Removed at Step 2 is - Smoking
## Removed at Step 3 is - rcs(Tampascale,3)
## 
## Selection correctly terminated, 
## No more variables removed from the model
## Removed at Step 1 is - JobDemands
## Removed at Step 2 is - Smoking
## Removed at Step 3 is - rcs(Tampascale,3)
## 
## Selection correctly terminated, 
## No more variables removed from the model
## Removed at Step 1 is - Smoking
## Removed at Step 2 is - JobDemands
## 
## Selection correctly terminated, 
## No more variables removed from the model
## 
## Imputation 3
## Removed at Step 1 is - Smoking
## Removed at Step 2 is - JobDemands
## 
## Selection correctly terminated, 
## No more variables removed from the model
## Removed at Step 1 is - Smoking
## Removed at Step 2 is - JobDemands
## Removed at Step 3 is - rcs(Tampascale,3)
## 
## Selection correctly terminated, 
## No more variables removed from the model
## Removed at Step 1 is - JobDemands
## Removed at Step 2 is - Smoking
## 
## Selection correctly terminated, 
## No more variables removed from the model
## Removed at Step 1 is - JobDemands
## Removed at Step 2 is - Smoking
## 
## Selection correctly terminated, 
## No more variables removed from the model
## Removed at Step 1 is - Smoking
## Removed at Step 2 is - JobDemands
## Removed at Step 3 is - rcs(Tampascale,3)
## 
## Selection correctly terminated, 
## No more variables removed from the model
## 
## Imputation 4
## Removed at Step 1 is - Smoking
## Removed at Step 2 is - JobDemands
## 
## Selection correctly terminated, 
## No more variables removed from the model
## Removed at Step 1 is - JobDemands
## Removed at Step 2 is - Smoking
## Removed at Step 3 is - rcs(Tampascale,3)
## 
## Selection correctly terminated, 
## No more variables removed from the model
## Removed at Step 1 is - Smoking
## Removed at Step 2 is - JobDemands
## Removed at Step 3 is - rcs(Tampascale,3)
## 
## Selection correctly terminated, 
## No more variables removed from the model
## Removed at Step 1 is - JobDemands
## Removed at Step 2 is - Smoking
## 
## Selection correctly terminated, 
## No more variables removed from the model
## Removed at Step 1 is - Smoking
## Removed at Step 2 is - JobDemands
## 
## Selection correctly terminated, 
## No more variables removed from the model
## 
## Imputation 5
## Removed at Step 1 is - Smoking
## Removed at Step 2 is - JobDemands
## Removed at Step 3 is - rcs(Tampascale,3)
## 
## Selection correctly terminated, 
## No more variables removed from the model
## Removed at Step 1 is - JobDemands
## Removed at Step 2 is - Smoking
## Removed at Step 3 is - rcs(Tampascale,3)
## 
## Selection correctly terminated, 
## No more variables removed from the model
## Removed at Step 1 is - Smoking
## Removed at Step 2 is - JobDemands
## Removed at Step 3 is - rcs(Tampascale,3)
## 
## Selection correctly terminated, 
## No more variables removed from the model
## Removed at Step 1 is - JobDemands
## Removed at Step 2 is - Smoking
## 
## Selection correctly terminated, 
## No more variables removed from the model
## Removed at Step 1 is - Smoking
## Removed at Step 2 is - JobDemands
## Removed at Step 3 is - rcs(Tampascale,3)
## 
## Selection correctly terminated, 
## No more variables removed from the model
res_cv
## $cv_stats
##                  Train      Test
## AUC          0.8804827 0.8575000
## Brier scaled 0.4470350 0.3832230
## R-squared    0.5445034 0.5053516
## 
## $auc_test
##                     95% Low C-statistic 95% Up
## C-statistic (logit)  0.7704      0.8575 0.9152
## 
## $test_coef
##  Intercept      Slope 
## -0.1025916  0.9374556

Back to Examples