solution_terms
of project()
to fix a test failure on R versions > 4.1.cv_varsel()
with nloo < n
where n
denotes the number of observations. (GitHub: #94, #252, commit feea39e)validate_search = FALSE
in cv_varsel()
.nclusters
(= 1
) and nclusters_pred
(= 5
) of varsel()
and cv_varsel()
were set internally (the user-visible defaults were NULL
). Now, nclusters
and ndraws_pred
(note the ndraws_pred
, not nclusters_pred
) have non-NULL
user-visible defaults of 20
and 400
, respectively. In general, this increases the runtime of these functions a lot. With respect to cv_varsel()
, the new vignette (see vignettes) mentions two ways to quickly obtain some rough preliminary results which in general should not be used as final results, though: (i) varsel()
and (ii) cv_varsel()
with validate_search = FALSE
(which only takes effect for cv_method = "LOO"
). (GitHub: #291 and several commits beforehand, in particular bbd0f0a, babe031, 4ef95d3, and ce7d1e0)proj_linpred()
and proj_predict()
, arguments nterms
, ndraws
, and seed
have been removed to allow the user to pass them to project()
. New arguments filter_nterms
, nresample_clusters
, and .seed
have been introduced (see the documentation for details). (GitHub: #92, #135)proj_linpred()
, dimensions are not dropped anymore (i.e., output elements pred
and lpd
are always S x N matrices now). (GitHub: #143)integrated = TRUE
, proj_linpred()
now averages the LPD (across the projected posterior draws) instead of taking the LPD at the averaged linear predictors. (GitHub: #143)newdata
does not contain the response variable, proj_linpred()
now returns NULL
for output element lpd
. (GitHub: #143)stanreg
(from package rstanarm) with offsets to have these offsets specified via an offset()
term in the model formula (and not via argument offset
).NULL
to a user-visible value (and NULL
is not allowed anymore).data
of get_refmodel.stanreg()
has been removed. (GitHub: #219)div_minimizer
of init_refmodel()
now always needs to return a list
of submodels (see the documentation for details). Correspondingly, the function passed to argument proj_predfun
of init_refmodel()
can now always expect a list
as input for argument fits
(see the documentation for details). (GitHub: #230)proj_predfun
of init_refmodel()
now always needs to return a matrix (see the documentation for details). (GitHub: #230)?`projpred-package`
. (GitHub: #235)Student_t()
family is regarded as experimental. Therefore, a corresponding warning is thrown when creating the reference model. (GitHub: #233, #252)Gamma()
family is regarded as experimental. Therefore, a corresponding warning is thrown when creating the reference model. (GitHub: paul-buerkner/brms#1255, #240, #252)init_refmodel()
in case of argument dis
being NULL
(the default) was dangerous for custom reference models with a family
having a dispersion parameter (in that case, dis
values of all-zeros were used silently). The new behavior now requires a non-NULL
argument dis
in that case. (GitHub: #254)cv_search
has been renamed to refit_prj
. (GitHub: #154, #265)as.matrix.projection()
has gained a new argument nm_scheme
which allows to choose the naming scheme for the column names of the returned matrix. The default ("auto"
) follows the naming scheme of the reference model fit (and uses the "rstanarm"
naming scheme if the reference model fit is of an unknown class). (GitHub: #82, #279)seed
(and .seed
) arguments now have a default of sample.int(.Machine$integer.max, 1)
instead of NULL
. Furthermore, the value supplied to these arguments is now used to generate new seeds internally on-the-fly. In many cases, this will change results compared to older projpred versions. Also note that now, the internal seeds are never fixed to a specific value if seed
(and .seed
) arguments are set to NULL
. (GitHub: #84, #286)as.matrix.projection()
method now also returns the estimated group-level effects themselves. (GitHub: #75)as.matrix.projection()
method now returns the variance components (population SD(s) and population correlation(s)) instead of the empirical SD(s) of the group-level effects. (GitHub: #74)README
file. (GitHub: #245)nclusters_pred
was removed. (GitHub: commit 5062f2f)project()
: Warn if elements of solution_terms
are not found in the reference model (and therefore ignored). (GitHub: #140)get_refmodel.default()
now passes arguments via the ellipsis (...
) to init_refmodel()
. (GitHub: #153, commit dd3716e)init_refmodel()
: The default (NULL
) for argument extract_model_data
has been removed as it wasn’t meaningful anyway. (GitHub: #219)folds
of init_refmodel()
has been removed as it was effectively unused. (GitHub: #220)solution_terms()
. This allowed the introduction of a solution_terms.projection()
method. (GitHub: #223)predict.refmodel()
now uses a default of newdata = NULL
. (GitHub: #223)weights
of init_refmodel()
’s argument proj_predfun
has been removed. (GitHub: #163, #224)div_minimizer
functions have been unified into a single div_minimizer
which chooses an appropriate submodel fitter based on the formula of the submodel, not based on that of the reference model. Furthermore, the automatic handling of errors in the submodel fitters has been improved. (GitHub: #230)plot.vsel()
. (GitHub: #234, #270)cvfun
for stanreg
fits will now always use inner parallelization in rstanarm::kfold()
(i.e., across chains, not across CV folds), with getOption("mc.cores", 1)
cores. We do so on all systems (not only Windows). (GitHub: #249)fit
of init_refmodel()
’s argument proj_predfun
was renamed to fits
. This is a non-breaking change since all calls to proj_predfun
in projpred have that argument unnamed. However, this cannot be guaranteed in the future, so we strongly encourage users with a custom proj_predfun
to rename argument fit
to fits
. (GitHub: #263)init_refmodel()
has gained argument cvrefbuilder
which may be a custom function for constructing the K reference models in a K-fold CV. (GitHub: #271)project()
, varsel()
, and cv_varsel()
to the divergence minimizer. (GitHub: #278)init_refmodel()
, any contrasts
attributes of the dataset’s columns are silently removed. (GitHub: #284)NA
s in data supplied to newdata
arguments now trigger an error. (GitHub: #285)as.matrix.projection()
(causing incorrect column names for the returned matrix). (GitHub: #72, #73)vsel
object. (GitHub: #79, #80)varsel()
. (GitHub #90)nloo
of cv_varsel()
. (GitHub: #93)cv_varsel()
, causing an error in case of !validate_search && cv_method != "LOO"
. (GitHub: #95)proj_linpred()
to raise an error if argument newdata
was NULL
. (GitHub: #97)lpd
in proj_linpred()
(for integrated = TRUE
as well as for integrated = FALSE
). (GitHub: #105)proj_linpred()
’s calculation of output element lpd
(for integrated = TRUE
). (GitHub: #106, #112)proj_linpred()
’s output elements pred
and lpd
(for integrated = FALSE
): Now, they are both S x N matrices, with S denoting the number of (possibly clustered) posterior draws and N denoting the number of observations. (GitHub: #107, #112)proj_predict()
’s output matrix to be transposed in case of nrow(newdata) == 1
. (GitHub: #112)proj_linpred()
. (GitHub: #114)varsel()
/make_formula
to fail with multidimensional interaction terms. (GitHub: #102, #103)cv_varsel()
for models with a single predictor. (GitHub: #115)nterms
of proj_linpred()
and proj_predict()
. (GitHub: #110)as.matrix.projection()
in case of 1 (clustered) draw after projection. (GitHub: #130)subfit
, make the column names of as.matrix.projection()
’s output matrix consistent with other classes of submodels. (GitHub: #132)nterms_max
of plot.vsel()
if there is just the intercept-only submodel. (GitHub: #138)search_path
in, e.g., varsel()
’s output. (GitHub: #140)unused argument
) when initializing the K reference models in a K-fold CV with CV fits not of class brmsfit
or stanreg
. (GitHub: #140)get_refmodel.default()
, remove old defunct arguments fetch_data
, wobs
, and offset
. (GitHub: #140)get_refmodel.stanreg()
. (GitHub: #142, #184)extract_model_data()
’s argument extract_y
in get_refmodel.default()
. (GitHub: #153, commit 39fece8)extract_model_data()
in K-fold CV. (GitHub: #153, commit 4f32195)proj_predfun()
for GLMMs. (GitHub: #174)proj_predfun()
for datafit
s. (GitHub: #177)summary.vsel()$selection
for objects of class vsel
created by varsel()
. (GitHub: #179)search_terms
are not consecutive in size. (GitHub: commit 34e24de)cv_varsel()$pct_solution_terms_cv
. (GitHub: #188, commit e529ec1)glm_elnet()
(the workhorse for L1 search), causing the grid for lambda to be constructed without taking observation weights into account. (GitHub: #198; note that the second part of #198 did not have any consequences for users)print.vsel()
causing argument digits
to be ignored. (GitHub: #222)cv_search
in varsel()
and cv_varsel()
to be TRUE
for datafit
s, although it should be FALSE
in that case. (GitHub: #223)Error: Levels '<...>' of grouping factor '<...>' cannot be found in the fitted model. Consider setting argument 'allow_new_levels' to TRUE.
) when predicting from submodels which are GLMMs for newdata
containing new levels for grouping factors. (GitHub: #223)predict.refmodel()
: Fix a bug for integer ynew
. (GitHub: #223)predict.refmodel()
: Fix input checks for offsetnew
and weightsnew
. (GitHub: #223)extract_model_data()
, the weights and offsets are now checked if they are of length 0 (and if yes, then they are set to vectors of ones and zeros, respectively). This is important for extract_model_data()
functions which return weights and offsets of length 0 (see, e.g., brms
version <= 2.16.1). (GitHub: #223)var
(the predictive variances) and regul
(amount of ridge regularization) to the internal submodel fitter for GLMs. (GitHub: #230)NA
s, an appropriate error is now thrown. Previously, the reference model was created successfully, but this caused opaque errors in downstream code such as project()
. (GitHub: #274)We have fully rewritten the internals in several ways. Most importantly, we now leverage maximum likelihood estimation to third parties depending on the reference model’s family. This allows a lot of flexibility and extensibility for various models. Functionality wise, the major updates since the last release are:
search_terms
that allows the user to specify custom unit building blocks of the projections. New vignette coming up.Better validation of function arguments.
Added print methods for vsel and cvsel objects. Added AUC statistics for binomial family. A few additional minor patches.
Removed the dependency on the rngtools package.
This version contains only a few patches, no new features to the user.
stan_glm(log(y) ~ log(x), ...)
, that is, it did not allow transformation for y
.refmodel
-objects using the generic get_refmodel
-function, and all the functions use only this object. This makes it much easier to use projpred with other reference models by writing them a new get_refmodel
-function. The syntax is now changed so that varsel
and cv_varsel
both return an object that has similar structure always, and the reference model is stored into this object.plot/summary
. Now it is possible to compare also to the best submodel found, not only to the reference model.nloo = n
by default in cv_varsel
. regul=1e-4
now by default in all functions.cv_search
argument for the main functions (varsel
,cv_varsel
,project
and the prediction functions). Now it is possible to make predictions also with those parameter estimates that were computed during the L1-penalized search. This change also allows the user to compute the Lasso-solution by providing the observed data as the ‘reference fit’ for init_refmodel. An example will be added to the vignette.Until this version, we did not keep record of the changes between different versions. Started to do this from version 0.9.0 onwards.