SHAPBoost

SHAPBoost is an R package for the implementation of the SHAPBoost feature selection algorithm, which is a boosting method that uses SHAP values for feature ranking and selects in an iterative forward fashion. It is designed to work with regression and survival analysis.

Installation

You can install the development version of SHAPBoost from GitHub with:

# install.packages("pak")
pak::pak("O-T-O-Z/SHAPBoost-R")

Regression example

For regression tasks, SHAPBoost can be used with various evaluators such as linear regression or XGBoost (xgb). For metrics, it support mae (Mean Absolute Error), mse (Mean Squared Error), and r2 (R-squared or \(R^{2}\)).

Below is an example using eyedata.

library(SHAPBoost)
library(flare)
data(eyedata)

shapboost <- SHAPBoostRegressor$new(
    evaluator = "lr",
    metric = "mae",
    siso_ranking_size = 10,
    verbose = 0,
)

X <- as.data.frame(x)
y <- as.data.frame(y)
subset <- shapboost$fit(X, y)

Survival example

For survival analysis, SHAPBoost can be used with the coxph or xgb evaluator and the c-index metric. Please provide the survival data in a format where the first column is the time to event and the second column is the event indicator (1 for event, 0 for censored). Moreover, the xgb_params argument can be used to pass additional parameters to the XGBoost model, such as objective and eval_metric. Supported objectives are survival:cox and survival:aft, with their respective evaluation metrics cox-nloglik and aft-nloglik.

An example using the gbsg dataset is shown below.

library(SHAPBoost)
library(survival)

shapboost <- SHAPBoostSurvival$new(
    evaluator = "coxph",
    metric = "c-index",
    verbose = 0,
    xgb_params = list(
        objective = "survival:cox",
        eval_metric = "cox-nloglik"
    )
)
X <- as.data.frame(gbsg[, -c(1, 10, 11)])
y <- as.data.frame(gbsg[, c(10, 11)])

subset <- shapboost$fit(X, y)