| Title: | Select Variables for Linear Models |
| Version: | 1.0.0 |
| Description: | Provides variable selection for linear models and generalized linear models using Bayesian information criterion (BIC) and model posterior probability (MPP). Given a set of candidate predictors, it evaluates candidate models and returns model-level summaries (BIC and MPP) and predictor-level posterior inclusion probabilities (PIP). For more details see Xu, S., Ferreira, M. A., & Tegge, A. N. (2025) <doi:10.48550/arXiv.2510.02628>. |
| License: | GPL-3 |
| Encoding: | UTF-8 |
| RoxygenNote: | 7.3.2 |
| Imports: | stats (≥ 4.2.2), GA (≥ 3.2.3), memoise (≥ 2.0.1) |
| Suggests: | knitr, rmarkdown, testthat (≥ 3.0.0) |
| VignetteBuilder: | knitr |
| LazyData: | true |
| Depends: | R (≥ 3.5.0) |
| Config/testthat/edition: | 3 |
| NeedsCompilation: | no |
| Packaged: | 2026-02-12 19:32:40 UTC; xushu |
| Author: | Shuangshuang Xu [aut, cre] |
| Maintainer: | Shuangshuang Xu <xshuangshuang@vt.edu> |
| Repository: | CRAN |
| Date/Publication: | 2026-02-17 16:00:08 UTC |
Function for checking model hierarchy
Description
Function for checking model hierarchy
Usage
check_model_hierarchy(model)
Value
model hierarchy
A data frame contains dependent variable and continuous independent variables
Description
A data frame with seven columns. The independent variables are in the first six columns. The dependent variable is in the seventh column.
Usage
dat
Format
dat
A data frame.
Title: Fitting generalized linear models for the best model
Description
Description: glm.best is used to fit generalized linear model for the best model provided by modelselect.glm.
Usage
glm.best(
object,
family,
method = "models",
threshold = 0.95,
x = FALSE,
y = FALSE
)
Arguments
object |
the model selection result from |
family |
a character string naming a family function describing the error distribution to be used in the model. |
method |
the criteria to do model select.
|
threshold |
The threshold for variable selection. The variables with posterior inclusion probability larger than the threshold are selected in the best model. The default is 0.95. |
x, y |
logicals. If |
Value
An object of class "glm", which is a list containing the following components:
coefficientsa named vector of coefficients.
residualsthe working residuals, that is the residuals in the final iteration of the IWLS fit.
fitted.valuesthe fitted mean values, obtained by transforming the linear predictors by the inverse of the link function.
rankthe numeric rank of the fitted linear model.
familythe family object used.
linear.predictorsthe linear fit on the link scale.
devianceup to a constant, minus twice the maximized log-likelihood.
aicA version of Akaike's An Information Criterion, minus twice the maximized log-likelihood plus twice the number of parameters, computed by the
aiccomponent of the family.null.devianceThe deviance for the null model, comparable with
deviance. The null model will include the offset, and an intercept if there is one in the model.iterthe number of iterations of IWLS used.
weightsthe working weights, that is the weights in the final iteration of the IWLS fit.
prior.weightsthe weights initially supplied, a vector of 1s if none were.
df.residualthe residual degrees of freedom.
df.nullthe residual degrees of freedom for the null model.
yif requested, the response vector used.
convergedlogical. Was the IWLS algorithm judged to have converged?
boundarylogical. Is the fitted value on the boundary of the allowable values?
modelif requested (the default), the model frame used.
callthe matched call.
formulathe formula supplied.
termsthe
terms.objectused.datathe data argument.
thresholdthe threshold used for method = "variables".
A data frame contains dependent variable and binary independent variables
Description
A data frame with seven columns. The independent variables are in the first six columns. The dependent variable is in the seventh column.
Usage
glmdat
Format
glmdat
A data frame.
Title: Fitting linear models for the best model
Description
Description: lm.best is used to fit linear model for the best model provided by modelselect.lm.
Usage
lm.best(object, method = "models", threshold = 0.95, x = FALSE, y = FALSE)
Arguments
object |
the model selection result from |
method |
the criteria to do model select.
|
threshold |
The threshold for variable selection. The variables with posterior inclusion probability larger than the threshold are selected in the best model. The default is 0.95. |
x, y |
logicals. If |
Value
An object of class "lm", which is a list containing the following components:
coefficientsA named vector of coefficients.
residualsThe residuals, that is the response minus the fitted values.
fitted.valuesThe fitted mean values.
rankThe numeric rank of the fitted linear model.
df.residualThe residual degrees of freedom.
callThe matched call.
termsThe
termsobject used.model(If requested) the model frame used.
qr(If requested) the QR decomposition of the design matrix.
xlevels(If the model formula includes factors) a record of the levels of the factors.
contrasts(If the model formula includes factors) the contrasts used.
offsetThe offset used.
thresholdthe threshold used for method = "variables".
Title: Variable selection for generalized linear models
Description
Description: use BIC to do variable selection.
Usage
modelselect.glm(
formula,
data,
family,
GA_var = 16,
maxiterations = 2000,
runs_til_stop = 1000,
monitor = TRUE,
popSize = 100,
verbose = TRUE
)
Arguments
formula |
an object of class "formula": a symbolic description of the model to be fitted.
A typical model has the form |
data |
an data frame containing the variables in the model. |
family |
a character string naming a family function describing the error distribution to be used in the model. |
GA_var |
if the number of variables is smaller than |
maxiterations |
the maximum number of iterations to run before the GA search is halted. |
runs_til_stop |
the number of consecutive generations without any improvement in the best fitness value before the GA is stopped. |
monitor |
a logical defaulting to TRUE showing the evolution of the search. If monitor = FALSE, any output is suppressed. |
popSize |
the population size. |
verbose |
Logical; if TRUE, print a brief summary of results. |
Value
modelselect.glm returns a list containing the following components:
modelsA data frame of candidate models' BIC and posterior probabilities, sorted by decreasing posterior probability
variablesA data frame of candidate variables' posterior inclusion probabilities
dataThe data with variables in the formula.
The function glm.best is used to obtain the linear fitting to the best model by posterior probability or by controlling variables' posterior inclusion probabilities.
Title: Variable selection for linear models
Description
Description: use BIC to do variable selection.
Usage
modelselect.lm(
formula,
data,
GA_var = 16,
maxiterations = 2000,
runs_til_stop = 1000,
monitor = TRUE,
popSize = 100,
verbose = TRUE
)
Arguments
formula |
an object of class "formula": a symbolic description of the model to be fitted.
A typical model has the form |
data |
an data frame containing the variables in the model. |
GA_var |
if the number of variables is smaller than |
maxiterations |
the maximum number of iterations to run before the GA search is halted. |
runs_til_stop |
the number of consecutive generations without any improvement in the best fitness value before the GA is stopped. |
monitor |
a logical defaulting to TRUE showing the evolution of the search. If monitor = FALSE, any output is suppressed. |
popSize |
the population size. |
verbose |
Logical; if TRUE, print a brief summary of results. |
Value
modelselect.lm returns a list containing the following components:
modelsA data frame of candidate models' BIC and posterior probabilities, sorted by decreasing posterior probability
variablesA data frame of candidate variables' posterior inclusion probabilities
dataThe data with variables in the formula.
The function lm.best is used to obtain the linear fitting to the best model by posterior probability or by controlling variables' posterior inclusion probabilities.