Type: | Package |
Title: | A Procedure for Multicollinearity Testing using Bootstrap |
Version: | 1.0.4 |
Date: | 2025-09-09 |
Maintainer: | Víctor Morales-Oñate <vmorales.ppb@gmail.com> |
Description: | Functions for detecting multicollinearity. This test gives statistical support to two of the most famous methods for detecting multicollinearity in applied work: Klein’s rule and Variance Inflation Factor (VIF). See the URL for the papers associated with this package, as for instance, Morales-Oñate and Morales-Oñate (2015) <doi:10.33333/rp.vol51n2.05>. |
Depends: | R (≥ 4.1.0) |
License: | GPL (≥ 3) |
Encoding: | UTF-8 |
Imports: | ggplot2,plotly |
Repository: | CRAN |
URL: | https://github.com/vmoprojs/MTest |
BugReports: | https://github.com/vmoprojs/MTest/issues |
LazyData: | true |
NeedsCompilation: | no |
Packaged: | 2025-09-11 03:49:58 UTC; victormoralesonate |
Author: | Víctor Morales-Oñate
|
Date/Publication: | 2025-09-11 10:00:23 UTC |
Bootstrap-based test for multicollinearity (Klein and VIF)
Description
MTest
implements a nonparametric (pairs) bootstrap to assess
multicollinearity by providing achieved significance levels (ASL)
for two widely used diagnostics: Klein's rule and the Variance Inflation
Factor (VIF). It returns bootstrap distributions of the global R^2
and the auxiliary R_j^2
(from regressions of each predictor on the remaining
predictors), along with p-values for both rules.
Usage
MTest(object, nboot = 100, nsam = NULL, trace = FALSE, seed = NULL,
valor_vif = 0.9)
Arguments
object |
A fitted model, typically of class |
nboot |
Integer. Number of bootstrap iterations (rows resampled with replacement). |
nsam |
Integer. Bootstrap sample size per iteration. Defaults to the original number of rows. |
trace |
Logical. If |
seed |
Integer. Optional RNG seed for reproducibility. |
valor_vif |
Numeric in |
Details
Model. Consider the linear regression model
Y_i = \beta_0 + \beta_1 X_{1i} + \cdots + \beta_p X_{pi} + u_i,\quad i=1,\ldots,n,
and the auxiliary regressions obtained by regressing each predictor X_j
on the remaining predictors X_{-j}
. Let R_g^2
be the global coefficient of
determination and R_j^2
the coefficient of determination of the j
-th auxiliary regression.
Diagnostics and achieved significance levels (ASL).
-
Klein's rule: flag multicollinearity if
R_j^2 > R_g^2
. We estimate the ASL asP(R_g^2 < R_j^2)
using the bootstrap distribution. -
VIF rule: flag multicollinearity if
VIF
exceeds a threshold. Since\mathrm{VIF}_j = 1/(1-R_j^2)
, this is equivalent to testingR_j^2
againstvalor_vif
. We estimateP(R_j^2 >
valor_vif
)
.
Bootstrap scheme.
The function resamples rows of the model frame (pairs bootstrap) and, for each bootstrap
sample, computes R_g^2
and R_j^2
(hence VIF) using the same expanded design
matrix as the original fit. This makes the procedure robust to transformed terms on either
side of the formula (e.g., log(y)
, I(X1^2)
, interactions, factors,
poly()
, etc.).
Value
An object of class MTest
, which is a list containing:
pval_vif |
Named numeric vector of ASL for the VIF rule,
|
pval_klein |
Named numeric vector of ASL for Klein's rule,
|
Bvals |
Numeric matrix of size |
VIFvals |
Numeric matrix |
vif.tot |
Observed VIF per predictor from the original design. |
R.tot |
Named numeric vector with observed |
nsam |
Bootstrap sample size actually used. |
nboot |
Number of bootstrap iterations actually performed. |
Interpretation
Larger
pval_klein[j]
indicates stronger evidence that predictorj
violates Klein's rule (R_j^2
often exceedsR_g^2
).Larger
pval_vif[j]
indicates thatR_j^2
frequently exceedsvalor_vif
(equivalently,VIF
exceeds the implied threshold).
Notes
For factor predictors, the underlying design includes multiple columns;
VIFvals
and VIF-related summaries are returned per design column.
In singular bootstrap samples some statistics may be NA
.
Author(s)
Víctor Morales Oñate vmorales.ppb@gmail.com
Bolívar Morales Oñate bmoralesonate@gmail.com
https://sites.google.com/site/moralesonatevictor/
https://www.linkedin.com/in/vmoralesonate/
References
Morales-Oñate, V., and Morales-Oñate, B. (2023). MTest: a Bootstrap Test for Multicollinearity. Revista Politécnica, 51(2), 53–62. doi:10.33333/rp.vol51n2.05
See Also
vif
for classical VIF computation.
Examples
## Minimal example (small nboot for speed)
set.seed(1)
data(simDataMTest, package = "MTest")
m1 <- stats::lm(y ~ ., data = simDataMTest)
boot.sol <- MTest(m1, nboot = 50, trace = FALSE, seed = 123, valor_vif = 0.90)
boot.sol$pval_vif
boot.sol$pval_klein
head(boot.sol$Bvals)
print(boot.sol)
Pairwise Kolmogorov–Smirnov tests for matrix columns
Description
Computes pairwise Kolmogorov–Smirnov (KS) tests between all columns of a
numeric matrix or data frame, returning the matrix of p-values. Typical inputs
include the Bvals
matrix from MTest
. For one-sided alternatives
("greater"
or "less"
), the p-value matrix is directional:
rows correspond to x
and columns to y
.
Usage
pairwiseKStest(X,
alternative = c("greater","less","two.sided"),
use = c("asis","pairwise.complete.obs"),
exact = NULL)
Arguments
X |
Numeric matrix or data frame. Columns are compared pairwise by KS tests.
A common use is |
alternative |
Character string: |
use |
Character string: |
exact |
Logical or |
Details
The function performs a KS test for each ordered pair of columns (i, j)
using ks.test(X[, i], X[, j], alternative = alternative, exact = exact)
.
For one-sided alternatives, the result is not symmetric, since rows play the
role of x
and columns the role of y
.
The returned Suggestion
follows the same rule as the original function:
for alternative = "greater"
, it sorts the row sums of the p-value matrix
(descending); for "less"
, it sorts the column sums; for "two.sided"
,
no suggestion is returned.
Value
A list of class pairwiseKStest
with components:
KSpwMatrix |
Numeric matrix of p-values. Rows are |
alternative |
Character string describing the alternative hypothesis used. |
Suggestion |
For |
Author(s)
Víctor Morales Oñate vmorales.ppb@gmail.com
Bolívar Morales Oñate bmoralesonate@gmail.com
https://sites.google.com/site/moralesonatevictor/
https://www.linkedin.com/in/vmoralesonate/
References
Morales-Oñate, V., and Morales-Oñate, B. (2023). MTest: a Bootstrap Test for Multicollinearity. Revista Politécnica, 51(2), 53–62. doi:10.33333/rp.vol51n2.05
See Also
Examples
## Typical workflow with MTest:
## (use small nboot for speed in examples)
set.seed(1)
data(simDataMTest, package = "MTest")
m1 <- stats::lm(y ~ ., data = simDataMTest)
boot.sol <- MTest(m1, nboot = 30, trace = FALSE, seed = 123)
## Compare only predictors (exclude "global"):
ks_res_greater <- pairwiseKStest(boot.sol$Bvals[, -1],
alternative = "greater",
use = "asis", # same behavior as the original
exact = NULL) # let ks.test decide
ks_res_greater$KSpwMatrix
ks_res_greater$Suggestion
## Two-sided (no suggestion by design):
ks_res_twosided <- pairwiseKStest(boot.sol$Bvals[, -1],
alternative = "two.sided")
ks_res_twosided$KSpwMatrix
Plot density or empirical cumulative distribution from MTest
Description
Plot density or empirical cumulative distribution from Bvals
in MTest output
.
Usage
## S3 method for class 'MTest'
plot(x, type=1,plotly = FALSE,...)
Arguments
x |
an object of the class |
type |
Numeric; 1 if density, 2 if ecdf plot is returned |
plotly |
Logical; if |
... |
other arguments to be passed to the function
|
Details
This function plots density or empirical cumulative distribution function from MTest bootstrap replications.
Value
Produces a plot. No values are returned.
See Also
MTest
for procedure and examples.
Simulated data for MTest
Description
This data set helps testing functions in MTest package, the generating process is documented in the reference.
Usage
simDataMTest
Format
A dataframe containing 10000 observations and four columns.
References
Morales-Oñate, V., and Morales-Oñate, B. (2023). MTest: a Bootstrap Test for Multicollinearity. Revista Politécnica, 51(2), 53–62. doi:10.33333/rp.vol51n2.05