This vignette introduces users to the main features of the
AssocBin
package. It begins with a high-level overview of
the basic functions and their uses before examining ways to customize
the package behaviour. Rather than get into technical detail from the
beginning, the package outputs and classes will be used to demonstrate
these aspects of the implementation. Hopefully, that will make this
vignette more approachable.
It’s easiest to understand the use of AssocBin
in the
context of exploring a data set. Included in the package is a version of
the the heart
disease data from the UCI machine learning data repository.
The heart
data can be loaded from the package using:
Inspecting the data:
## 'data.frame': 920 obs. of 15 variables:
## $ age : num 63 67 67 37 41 56 62 57 63 53 ...
## $ sex : Factor w/ 2 levels "female","male": 2 2 2 2 1 2 1 1 2 2 ...
## $ cp : Factor w/ 4 levels "atypical","non-angina",..: 4 3 3 2 1 1 3 3 3 3 ...
## $ trestbps: num 145 160 120 130 130 120 140 120 130 140 ...
## $ chol : num 233 286 229 250 204 236 268 354 254 203 ...
## $ fbs : logi TRUE FALSE FALSE FALSE FALSE FALSE ...
## $ restecg : Factor w/ 3 levels "hypertrophy",..: 1 1 1 2 1 2 1 2 1 1 ...
## $ thalach : num 150 108 129 187 172 178 160 163 147 155 ...
## $ exang : logi FALSE TRUE TRUE FALSE FALSE FALSE ...
## $ oldpeak : num 2.3 1.5 2.6 3.5 1.4 0.8 3.6 0.6 1.4 3.1 ...
## $ slope : Factor w/ 3 levels "down","flat",..: 1 2 2 1 3 3 1 3 2 1 ...
## $ ca : Factor w/ 4 levels "0","1","2","3": 1 4 3 1 1 1 3 1 2 1 ...
## $ thal : Factor w/ 3 levels "normal","fixed",..: 2 1 3 1 1 1 1 1 3 3 ...
## $ num : Factor w/ 5 levels "0","1","2","3",..: 1 3 2 1 1 1 4 1 3 2 ...
## $ study : chr "cleveland" "cleveland" "cleveland" "cleveland" ...
It contains 920 observations of 15 variables collected on patients referred to various hospitals around the world to undergo a series of measurements of heart function in order to relate them to the presence of coronary heart disease. The variables are:
age
: agesex
: sexcp
: clinical description of any chest paintrestbps
: resting blood pressure on hospital
admissionchol
: blood serum cholesterol concentrationfbs
: indicator of whether fasting blood sugar is
greater than 120 mg/dlrestecg
: classification of heart waves at rest as
measured by an electrocardiogramthalach
: maximum heart rate achieved in an exercise
testexang
: whether the exercise test induced anginaoldpeak
: ST heart wave depression induced by the
exercise testslope
: the slope of the ST heart wave peak during the
exercise testca
: the count of calcified major blood vessels in the
heart identified by fluoroscopic
imagingthal
: categorization of any defects in heart
circulation induced by exercise as measured by thallium
scintigraphynum
: count of major blood vessels in the heart with a
narrowing of greater than 50%study
: the location of the patient’s testingOf particular interest is the num
variable, the original
response in the study which collected the data (Detrano
et al., 1989). It counts the number of diseased of coronary vessels,
where the presence of disease is defined as a narrowing of the vessel by
more than 50% from a healthy baseline. Basically, patients with
num=0
have hearts without serious coronary artery disease
and the severity of disease increases with each integer increase of
num
due to more blood vessels being blocked
significantly.
For simplicity, we’ll clean the data somewhat by removing mostly missing variables and dropping incomplete observations for the rest of the vignette.
heartClean <- heart
heartClean$thal <- NULL
heartClean$ca <- NULL
heartClean$slope <- NULL
heartClean <- na.omit(heartClean)
str(heartClean)
## 'data.frame': 740 obs. of 12 variables:
## $ age : num 63 67 67 37 41 56 62 57 63 53 ...
## $ sex : Factor w/ 2 levels "female","male": 2 2 2 2 1 2 1 1 2 2 ...
## $ cp : Factor w/ 4 levels "atypical","non-angina",..: 4 3 3 2 1 1 3 3 3 3 ...
## $ trestbps: num 145 160 120 130 130 120 140 120 130 140 ...
## $ chol : num 233 286 229 250 204 236 268 354 254 203 ...
## $ fbs : logi TRUE FALSE FALSE FALSE FALSE FALSE ...
## $ restecg : Factor w/ 3 levels "hypertrophy",..: 1 1 1 2 1 2 1 2 1 1 ...
## $ thalach : num 150 108 129 187 172 178 160 163 147 155 ...
## $ exang : logi FALSE TRUE TRUE FALSE FALSE FALSE ...
## $ oldpeak : num 2.3 1.5 2.6 3.5 1.4 0.8 3.6 0.6 1.4 3.1 ...
## $ num : Factor w/ 5 levels "0","1","2","3",..: 1 3 2 1 1 1 4 1 3 2 ...
## $ study : chr "cleveland" "cleveland" "cleveland" "cleveland" ...
## - attr(*, "na.action")= 'omit' Named int [1:180] 306 331 335 338 348 369 376 379 385 390 ...
## ..- attr(*, "names")= chr [1:180] "306" "331" "335" "338" ...
The simplest ways to use the AssocBin
package to explore
a data set are the DepSearch
and depDisplay
functions. DepSearch
performs all pairwise comparisons
between variables using recursive random binning and returns the results
in a DepSearch
S3 object. depDisplay
generates
a departure display, a two-dimensional histogram highlighting areas of
high and low density, for a given variable pair.
We start by comparing a pair of variables directly. Using
depDisplay
, we can inspect the relationship between patient
sex
and num
using a departure display
Optional arguments can be supplied to change plot features following
the plot
naming conventions.
SexVsNum <- depDisplay(heartClean$sex, heartClean$num, xlab = "Sex",
ylab = "Number of arteries >50% obstructed",
pch = 20)
Labels and point types aside, reading this plot requires a basic
understanding of the underlying algorithm. sex
and
num
are both categorical variables, and so the departure
display is a particular way of encoding the contingency table between
them. Explicitly:
rbind(cbind(table(num = heartClean$num, sex = heartClean$sex), total = table(heartClean$num)),
total = c(table(heartClean$sex), nrow(heartClean)))
## female male total
## 0 131 226 357
## 1 26 178 204
## 2 7 72 79
## 3 8 70 78
## 4 2 20 22
## total 174 566 740
Each coloured cell, or bin, in the departure display corresponds to a
count in the table excluding the columns and rows labelled
total
, which provide the marginal distributions. The width
and height of each bin reflect these distributions and are proportional
to the corresponding row and column totals respectively. The area of
each bin is therefore proportional to the expected proportion of points
it contains under the assumption of independence (when the joint
distribution is proportional to the product of the marginal
distributions). Saturation and hue communicate how severely the observed
counts exceed or fall short of this expected count.
Take, for example, the bin with labelled ‘female’ horizontally and ‘0’ vertically. The width of this bin is given by the count of female patients (174) divided by the total number of patients (740) then multiplied by the width of the plotting area. This means it occupies a relative width of \(w= 174/740 = 0.235\) of the plot width. Its height is similarly determined by the count of patients without any coronary artery disease (CAD) divided by the total number and it has a relative height of \(h= 357/740 = 0.482\) to the plot.
Under independence, the joint probability \(P(\text{sex}=x, \text{num}=y)\) obeys the factorization \[P(\text{sex}=x, \text{num}=y) = P(\text{sex}=x) P(\text{num}=y)\] and so the expected count of patients in our example bin, female patients without CAD, is given by \[\frac{357}{740}*\frac{174}{740}*740=83.9.\] Referring to the analogous bin in the contingency table, we have observed 131. As this is a larger number than expected, the bin is given a red hue (blue-shaded bins indicate fewer observations in a bin than expected). The saturation of this shading is determined by the magnitude of the standardized Pearson residual. For bin \(i\) with expected count \(e_i\), observed count of \(o_i\), relative width \(w_i\), and relative height \(h_i\) this is defined as \[r_i = \frac{o_{i} - e_{i}}{\sqrt{e_{i}(1 - w_i)(1 - h_i)}}.\] The standardized Pearson residuals are a corrected version of the typical Pearson residuals for contingency tables which follow a standard normal distribution. This fact is used in the departure display to determine the saturation, where no saturation is applied to standardized residuals which have an absolute value less than 2 and a colour ramp applied which achieves its deepest saturation at 4. For our example bin, the standardized residual is \[\frac{131 - 83.9}{\sqrt{83.9 \left ( 1 - \frac{174}{740} \right ) \left ( 1 - \frac{357}{740} \right )}} = 8.17,\] which is quite a bit larger than the upper part of the colour ramp and so receives the deepest possible saturation.
The same process as has been applied to this example bin is applied to all other bins to obtain their hues and saturations before \(o_i\) points are overlaid at randomly chosen positions within each bin to add a second visual display of density. In this way, the departure display communicates visually the departure of the observed counts from what we would expect if the two variables were independent. Areas of deep red saturation indicate regions with far more points and areas of deep blue indicate areas with far fewer points than we would expect under typical sampling variation. These therefore draw our attention to these areas that the model of independence does not explain well.
In the example bin, we can see the model of independence does not
describe the observed pattern well: many more female patients lack CAD
and many more male patients have CAD than we would expect under
independence. Note the SexVsNum <-
assignment in the
depDisplay
call. Aside from plotting, the function passes
the resulting bins invisibly to allow further exploration. As each bin
is stored as a list of features, these are not very easy to inspect:
## List of 10
## $ :List of 7
## $ :List of 7
## $ :List of 7
## $ :List of 7
## $ :List of 7
## $ :List of 7
## $ :List of 7
## $ :List of 7
## $ :List of 7
## $ :List of 7
## List of 7
## $ x : Factor w/ 2 levels "female","male": 1 1 1 1 1 1 1 1 1 1 ...
## $ y : Factor w/ 5 levels "0","1","2","3",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ bnds :List of 2
## ..$ x: num [1:2] 0 174
## ..$ y: num [1:2] 0 357
## $ expn : num 83.9
## $ n : int 131
## $ depth : num 1
## $ stopped: logi TRUE
Helper functions allow us to compute aggregate and individual bin
statistics, however. For example, to compute the \(\chi^2\) test statistic for independence,
simply call binChi
.
## $residuals
## [1] 5.1360485 -3.1718170 -2.6858023 -2.4145554 -1.3950709 -2.8477085
## [7] 1.7586302 1.4891569 1.3387626 0.7735042
##
## $stat
## [1] 67.23966
##
## $nbins
## [1] 10
This computes the \(\chi^2\) statistic and Pearson residuals for the bins. The correct degrees of freedom for this statistic are returned by other helper functions, but more on that later.
When one or more variable is continuous, the output of
depDisplay
changes in a few important ways even though it
is read the same. Consider a comparison of age
and
num
.
set.seed(1235) # more on this later
# the depDisplay function also has a method for data.frames
AgeVsNum <- depDisplay(x = heartClean, pair="age:num", xlab = "Age",
ylab = "Number of arteries >50% obstructed",
pch = 20, col = adjustcolor('gray50', alpha.f=0.5))
Or, for a pair of continuous variables, thalach
and
oldpeak
.
set.seed(812)
AgeVsChol <- depDisplay(heartClean$thalach, heartClean$oldpeak,
xlab = "Maximum heart rate during exercise",
ylab = "ST wave depression during exercise",
pch = 20, col = adjustcolor('gray50', alpha.f=0.5))
Note that in both of these cases, the bins no longer sit on a simple grid. Adding borders makes this even clearer (with the side effect of making this statistical graphic look a bit like a Piet Mondrian piece).
set.seed(812)
AgeVsNum <- depDisplay(heartClean$thalach, heartClean$oldpeak,
xlab = "Maximum heart rate during exercise",
ylab = "ST wave depression during exercise",
pch = 20, col = adjustcolor('gray50', alpha.f=0.5),
border = "black")
As before, the labels on the axis of the categorical variable denote the relative sizes of each labelled category. In contrast, the continuous margin is rather chaotic and worth discussing.
When both variables are categorical, the joint distribution can be fully described by the joint probabilities of each bin. When one, or both, of the variables being compared are continuous, representing the joint distribution between the two is more complicated. No single contingency table fully represents their joint distribution because aggregation obscures variation at finer resolutions. As well, any constant grid applied to every data set will have blind spots: patterns which it lacks power to detect. To create a set of bins to display and measure continuous data, then, we need a dynamic algorithm to build a bivariate histogram for a given data set.
Creating such a histogram can be done in many ways (see Chapter 2.3 here for a brief survey), but there are advantages to constructing them using random recursive splits (see Salahub and Oldford, 2025). These splits occur in a stepwise fashion, where each bin is split at each step until a set of stop criteria are satisfied. In the case of random recursive splits to measure association, natural stop criteria are based on the size of the bin which is proportional to the number of points we expect it to contain.
Of course, this requires that we know the expected count of each bin. We can accomplish this by converting continuous margins to their ranks, thereby ensuring a uniform distribution along the corresponding axis. To give a sense of the original distribution, the axis therefore displays the five number summary of the data at the corresponding ranks to give the minimum, maximum, median, and quartiles.
With the construction understood, the interpretation of these plots
continues largely the same as in the dual categorical case. In the plot
of age
and num
, we can see dark red areas in
the top right and bottom left corners and light blue areas in the bottom
right and top left, suggesting that the number of blocked arteries tends
to increase in the patients for this study. For thalach
and
oldpeak
, the opposite trend is shown. In both cases, the
saturation is much lighter than the case of sex
and
num
, suggesting weaker associations for these latter two
comparisons.
Instead of exploring the data piecemeal using pairs chosen one at a
time, we can assess the associations between all pairs with one call to
the DepSearch
(for Dependence Search)
function.
This returns a DepSearch
object, which contains the
generated bins for all pairs of variables in the dataset along with
details such as the degrees of freedom of the binning, the number of
bins, the \(\chi^2\) statistic for each
pair, and the \(p\)-value of that
statistic. These results can then be viewed at a high level using the
associated summary
method.
## All 66 pairs in heartClean recursively binned with type distribution:
##
## factor:factor factor:numeric numeric:numeric
## 21 35 10
##
## 52 pairs are significant at 5% and 42 pairs are significant at 1%
##
## Most significant 10 pairs:
## study:chol (1.7e-70)
## study:restecg (4.7e-57)
## study:num (1.3e-38)
## num:exang (1.5e-38)
## num:cp (2.3e-38)
## exang:cp (1.8e-35)
## study:thalach (2.7e-24)
## exang:oldpeak (1.4e-22)
## study:age (9.6e-21)
## num:oldpeak (1.6e-20)
Triplet plots which display the original data, the rank data, and the
bins which form the basis of each \(p\)-value can be inspected using
plot
. By default, this displays the top five strongest
associations.
The indices of the pairs to display can be specified by the
which
argument. Note that values given to
which
specify the indices of the pairs when placed in order
from strongest to weakest association, so that
plot(heartAssociations, which = 1:5)
produces the same plot
as the default call. As there are 66 pairs in this data, the weakest
associations can be displayed by specifying
which=61:66
.
By providing the data on the original scale, the rank scale, and as it is ‘seen’ by the algorithm through the binning of the ranks, an analyst can quickly understand the structure of any dependence between a pair of variables. Moreover, as all pairs are evaluated using \(p\)-values, comparisons between all pairs are fair regardless of the data types of each pair.
It should be noted here that these \(p\)-values are computed only approximately. As explored in Salahub and Oldford, 2025, the rank margins vary less than uniformly distributed margins because they lay on a lattice. Therefore, the classical \(\chi^2\) test based on arbitrary partitions which takes \[df = K - 1\] produces smaller statistics than would be expected for truly uniform data. This creates overly conservative \(p\)-values in the case of comparisons involving one or more continuous variables. Extensive simulations carried out using different approximations found that a simple approximation inspired by contingency tables works quite well to account for this.
For a contingency table with \(R\)
rows and \(C\) columns, we account for
the constrained row and column totals by subtracting a degree of freedom
from each. So, supposing \(RC = K\)
(the total number of bins), the degrees of freedom are not given by
\(K-1\) but instead \[df = (R-1)(C-1).\] Taking this same idea
to the dual continuous case where recursive binning has generated \(K\) bins, we ignore the arbitrary and
mis-aligned nature of the bins and instead treat the \(K\) bins like the result of a regular grid
with its implied contingency table along rows and columns. This suggests
the approximation \[df = (\sqrt{K} -
1)^2,\] which works surprisingly well in practice. Similarly,
when one variable is categorical on \(M\) categories and the other is continuous,
the same line of thinking leads to a formula using the average number of
bins per category as \[df = (M-1) \left (
\frac{K}{M} - 1 \right ).\] Optionally, the argument
ptype
can be set in the call to DepSearch
to
change the \(p\)-value approximation
used. The other options include the conservative \(K-1\), a gamma approximation to the
distribution, and a fitted degrees of freedom based on a large empirical
study.
For people who want to experiment with recursive binning,
AssocBin
offers plenty of room for customization. While the
default settings split bins randomly until they reach a certain minimum
size, by changing the scoring function and stop criteria, very different
behaviours are possible. Several optional arguments to
DepSearch
control these aspects of binning:
stopCriteria
allows for stop criteria to be set,
catCon
allows specification of the splitting function to
use on the continuous margin of mixed pairs with one categorical and one
continuous variable, and conCon
allows specification of the
splitting function to use for dual continuous margins.
The simplest of these to use and specify is
stopCriteria
, which is supported by the helper function
makeCriteria
. This helper captures the arguments passed to
it and stores these as a single logical expression which is then parsed
and evaluated within each bin to determine whether splitting should
continue. As a result, they must reference one of the named bin
features
x
: vector giving the horizontal coordinates of
observations within the bin,y
: vector giving the vertical coordinates of
observations within the bin,bnds
: a list of two vectors, x
and
y
which give the horizontal and vertical extents of the
bin,expn
: the expected number of points in the bin,n
: the observed number of points in the bin, anddepth
: the number of recursive splits required to
create the bin from the initial bin containing all pointsArguments passed to makeCriteria
which reference objects
not included in this list rely on lexical scoping within R
,
and so should be used deliberately and with care. Generally, the stop
criteria can be constructed with a simple call such as
stopCrits <- makeCriteria(depth >= 10, # maximum depth of 10
expn <= 10, # smallest possible bin size of 5
n < 1 # don't split empty bins
)
stopCrits
## [1] "depth >= 10 | expn <= 10 | n < 1 | stopped"
Note that it is necessary to specify a stop criterion of
expn <= 2*k
to restrict bin size to k
, as
splitting a bin with expn < 2*k
will necessarily produce
at least one bin with expn < k
. Of course, more
complicated logical expressions are also possible. For example, one
could implement a splitting procedure that stops splitting any bin which
achieves some threshold for the \(\chi^2\) residual in the bin to create a
greedy algorithm which preserves any large departures it encounters.
## [1] "abs(expn - n)/sqrt(expn) > 4 | expn <= 10 | n < 1 | stopped"
If splitting behaviour more complex than the random splits is
desired, provided functionals can be used to construct custom splitting
functions. While any splitting function could be specified so long as it
accepts a bin
and returns a list of two bin
s
that partition the original, the provided splitting functions are
implemented under a specific framework of optimization.
It can be proven that any convex objective function which compares
\(o_i\) and \(e_i\) (the observed and expected counts
within a bin) will be maximized by a split at one of the observations
within the bin. Therefore, scoring functions need only consider splits
at observation coordinates for many common scores like the mutual
information (implemented as miScores
) or the \(\chi^2\) statistic (implemented as
chiScores
). In each bin, the scoring functions assess the
score resulting from splits at each observation (and some
‘pseudo-observations’ to allow the creation of empty bins) and identify
which coordinate creates a split which optimizes the score. For this
reason, the included scoring functions accept three arguments:
bounds
, nbelow
, and n
, as these
alone can be used to determine the maximum for many bin-scoring
functions.1
In practice, these facts are not relevant to the user when setting up scoring. To set up the algorithm to maximize the \(\chi^2\) statistic, for example, we use the following lines.
conConChi <- function(bn) maxScoreSplit(bin = bn, scorer = chiScores)
# the univariate splitter requires an additional argument specifying which
# margin should be split
catConChi <- function(bn, on) uniMaxScoreSplit(bin = bn, scorer = chiScores,
on = on)
Then, we pass them to the DepSearch
call, maybe
alongside our greedy stop criteria.
heartAssociations_greedy <- DepSearch(heartClean,
stopCriteria=greedyCrits,
catCon=catConChi,
conCon=conConChi)
Plotting this greedy version of the algorithm, the top associations do not change much:
Indeed, a key finding of Salahub and Oldford, 2025 is
that binning algorithms based on maximization do not perform much better
than random splits in the identification of patterns, and that
maximization introduces systematic bias to pattern detection. A downside
of maximization, however, can be seen in the considerably inflated
significance of the top association between study
and
chol
in this greedy algorithm compared to the random one.
By actively seeking large residual values, maximization prevents the
computation of correct, or approximately correct, \(p\)-values through typical distributional
approximations. Large simulations must be used instead.
Maximizing in a greedy way is not all bad, however. For one, it makes the algorithm deterministic for a given sample, while the random algorithm is inherently somewhat noisy. Additionally, and evidently in the case of the top association, it produces sharper departure displays which better highlight the areas of low and high point concentration.
Aside from control over how binning is performed, plots of binnings
can be customized in AssocBin
. In the simplest case, this
works by using the usual graphical parameters as shown previously.
# a final way to use depDisplay is on a depSearch object
depDisplay(heartAssociations, pair="thalach:oldpeak",
xlab = "Maximum heart rate during exercise",
ylab = "ST wave depression during exercise",
pch = "+", col = adjustcolor('purple', alpha.f=0.5),
border = "black")
Finer control is obtained using the lower-level
plotBinning
function and the different bin fill helper
functions. Let’s start by saving these particular bins so we can display
them in different ways.
To use plotBinning
, these bins must be passed in
alongside a fill function to colour the bins. Fill functions must accept
a list of bins and return a vector of colours that can be interpreted by
R’s plotting functions. While custom fill functions can be defined to
encode any aspect of a bin, the three included fill functions
depthFill
, residualFill
, and
importanceFill
saturate bins based on their depth, the
magnitude and sign of their residuals (based on a provided residual
function), and the threshold on standardized Pearson residuals defined
above. All three options lead to very different displays.
# note that plotBinning does not have access to the marginal information to plot
# quantiles and so the marginal labels give the ranks
plotBinning(thalachOldpeak, pch = 20,
xlab = "Maximum heart rate during exercise",
ylab = "ST wave depression during exercise",
showXax = TRUE, showYax = TRUE,
fill=depthFill(thalachOldpeak))
The depth fill, for example, lets us see the path of the algorithm. Less shaded areas indicate points where splitting was stopped earlier than areas with deeper saturation. For this particular pair, it is not so striking, but a very different pattern results from strong linear structures. Consider the following example which accesses low-level functions to perform binning manually.
x <- rnorm(1000)
y <- 2*x + rnorm(1000, sd = 0.3)
rankx <- rank(x, ties.method = "random")
ranky <- rank(y, ties.method = "random")
# set up splitting criteria: depth stop limits run time (not necessary here)
criteria <- makeCriteria(expn <= 10, n == 0, depth >= 10)
# define the stop function using these criteria
stopFn <- function(bns) stopper(bns, criteria)
# use binner to run the algorithm
xyBins <- binner(x = rankx, y = ranky, stopper = stopFn, splitter = rIntSplit)
# plot with depthfill
set.seed(2119)
plotBinning(xyBins, fill=depthFill(xyBins), pch = 20)
The advantage of the recursive splitting is obvious when viewed with this plot. In contrast to regular grids, the adaptive two-dimensional histogram generated by recursive binning with stop criteria places a greater density of bins, and therefore more focus, in areas of high density than those of low density. Even when these bins are chosen randomly, this creates a more efficient use of the same number of bins.
A more typical fill can be gleaned from the residualFill
function.
plotBinning(thalachOldpeak, pch = 20,
xlab = "Maximum heart rate during exercise",
ylab = "ST wave depression during exercise",
showXax = TRUE, showYax = TRUE,
fill=residualFill(thalachOldpeak, nbr = 10))
The fill from this function simply represents the residuals. By
defauly, blue indicates a negative residual while red indicates a
positive one. A great deal of customization is possible with this
function: custom colour breaks can be specified using
breaks
, or alternatively the number of breaks can be
specified using nbr
to increase or decrease the resolution.
Should we want to use a different residual function to generate the
saturation, the resFun
can be specified. We can also change
the colour range using the colrng
argument.
plotBinning(thalachOldpeak, pch = 20,
xlab = "Maximum heart rate during exercise",
ylab = "ST wave depression during exercise",
showXax = TRUE, showYax = TRUE,
fill=residualFill(thalachOldpeak, nbr = 50,
resFun=binMI,
colrng = c("orange", "pink", "blue")))
Finally, the importanceFill
function implements the
procedure described above. It standardizes the \(\chi^2\) residuals and applies a Bonferroni
correction before shading only those bins with residuals significant
when standardized and multiple testing is accounted for. This is the
default fill applied to bins by depDisplay
.
More complex splitting logic based on arbitrary bin
features is supported by sandboxMaxSplit
, which applies the
scoring function directly to the list of bins at each step.↩︎