This vignette distils the key SelectBoost.gamlss workflows into lightweight pseudocode. The goal is to surface the control flow and data preparation steps that matter when you use the package.
sb_gamlss() (scoping,
correlated resampling, engine dispatch).sb_prepare_selectboost() derives grouped
SelectBoost simulations from your design matrices.selection_table(),
confidence_table(), sb_gamlss_c0_grid())
consume the intermediate objects.Notation:
scope — candidate terms for a parameter (μ, σ, ν,
τ).base — always-included terms for a parameter.sb_gamlss)The main helper orchestrates correlated resampling, engine-specific fits, aggregation, and a final refit with stable terms.
Algorithm sb_gamlss(formula, data, family, scopes, base_formulas, B, sample_fraction, pi_thr, engines, c0, use_groups)
1. Validate formulas, convert data.frame inputs, and optionally standardise numeric predictors.
2. Build base design matrices per parameter, keeping track of sanitized column names and term maps.
3. For each scope formula:
a. Call sb_prepare_selectboost(data, scope, B, c0, use_groups) to obtain
normalised matrices, grouped indices, and pre-simulated SelectBoost draws.
b. Form the "upper" formula that contains both base and candidate columns for stepwise refits.
4. For each parameter (μ, σ, ν, τ):
a. Define a selector callback that, given a candidate design matrix subset and response subset,
fits the requested engine (stepGAIC, glmnet, grpreg, or sgl) on a bootstrap subsample of rows.
b. Use SelectBoost::boost.apply() with the correlated simulations to repeat the selector B times,
returning a coefficient matrix whose rows correspond to candidate columns.
c. Convert column-level selection frequencies into term-level counts, respecting scope term maps.
5. Collate selection tables for all parameters and mark base terms as always selected.
6. Augment each base formula with terms whose selection proportion ≥ pi_thr.
7. Refit gamlss() on the full data using the final formulas and return the sb_gamlss object.
sb_gamlss_c0_grid() automates repeated stability runs
over a vector of \(c_0\) thresholds,
while autoboost_gamlss() converts the grid into a one-click
workflow.
Algorithm sb_gamlss_c0_grid(args, c0_grid)
1. For each c0 value:
a. Call sb_gamlss() with the supplied arguments and the current c0.
b. Append the resulting selection table with an extra column storing c0.
2. Combine all selection tables, keep the reference to each fitted sb_gamlss object,
and record the stability threshold (pi_thr).
Algorithm autoboost_gamlss(args, c0_grid)
1. Run sb_gamlss_c0_grid() to obtain fits and per-term selection proportions across c0.
2. For each c0, sum the positive excess of selection proportions above pi_thr.
3. Select the c0 with the highest total excess (ties resolved towards the median grid value).
4. Return the sb_gamlss fit associated with the chosen c0, tagging it with diagnostic metadata.
Two convenience helpers reuse the core algorithm with modified budgets or grids.
Algorithm fastboost_gamlss(args)
1. Override B (default 30) and sample_fraction (default 0.6).
2. Delegate to sb_gamlss() with the reduced budget for faster, approximate screening.
Algorithm tune_sb_gamlss(config_grid, base_args, metric)
1. For each configuration in config_grid:
a. Merge it into base_args and run a small sb_gamlss() fit using B_small bootstraps.
b. If metric == "stability", compute the mass of selection proportions above pi_thr
and subtract score_lambda × (# stable terms).
c. If metric == "deviance", perform cross-validated deviances via cv_deviance_sb().
2. Choose the configuration with the highest score and return both the winning sb_gamlss fit
and a score table for auditing.
Downstream diagnostics turn stability curves into interpretable rankings.
Algorithm confidence_table(grid, pi_thr)
1. Group grid$table by parameter and term.
2. Within each group, report the maximum selection count and derive selection proportions.
3. Return a data frame combining all parameters with the supplied threshold.
Algorithm confidence_functionals(grid, pi_thr, q, weight_fun, conservative)
1. Optionally replace observed proportions with Wilson lower bounds if conservative = TRUE.
2. For each term:
a. Sort by c0, integrate the selection curve using trapezoidal (or step) rule to obtain AUSC.
b. Compute the thresholded positive area, weighted AUSC, coverage, extrema, and quantiles `q`.
c. Combine the metrics into a rank_score summary.
3. Order terms by rank_score for reporting and plotting helpers.
Together, these routines document how SelectBoost.gamlss orchestrates correlated resampling, selection aggregation, hyper-parameter exploration, and the confidence metrics used for reporting.