Title: Causal Decomposition Analysis
Version: 0.2.0
Depends: R (≥ 3.5)
Imports: stats, parallel, MASS, nnet, SuppDists, PSweight, utils, rlang, DynTxRegime, distr, rpart, dplyr, modelObj, magrittr, knitr
Suggests: rmarkdown, rpart.plot, spelling, CBPS, pbmcapply
VignetteBuilder: knitr
Description: We implement causal decomposition analysis using methods proposed by Park, Lee, and Qin (2022) and Park, Kang, and Lee (2023), which provide researchers with multiple-mediator imputation, single-mediator imputation, and product-of-coefficients regression approaches to estimate the initial disparity, disparity reduction, and disparity remaining (<doi:10.1177/00491241211067516>; <doi:10.1177/00811750231183711>). We also implement sensitivity analysis for causal decomposition using R-squared values as sensitivity parameters (Park, Kang, Lee, and Ma, 2023 <doi:10.1515/jci-2022-0031>). Finally, we include individualized causal decomposition and sensitivity analyses proposed by Park, Kang, and Lee (2025+) <doi:10.48550/arXiv.2506.19010>.
License: GPL-2
Encoding: UTF-8
Language: en-US
RoxygenNote: 7.3.2
LazyData: true
NeedsCompilation: no
Author: Suyeon Kang [aut, cre], Soojin Park [aut], Karen Xu [ctb]
Maintainer: Suyeon Kang <suyeon.kang@ucf.edu>
Packaged: 2025-08-27 15:09:48 UTC; suyeonkang
Repository: CRAN
Date/Publication: 2025-08-27 16:20:22 UTC

causal.decomp: Causal Decomposition Analysis.

Description

The causal.decomp package provides six important functions: mmi, smi, pocr, sensitivity, ind.decomp, and ind.sens.

Author(s)

Maintainer: Suyeon Kang suyeon.kang@ucf.edu

Authors:

Other contributors:


Synthetic Data for illustrating optimal treatment regimes and individualized effects

Description

A randomly generated dataset containing 2000 cases 7 columns with no missing values. The intermediate confounders are assumed to be independent of each other.

Usage

idata

Format

A data frame containing the following variables. The data are provided only for explanatory purposes.

Y:

A continuous outcome variable.

R:

A binary group indicator with a value of 0 (reference) and 1 (comparison).

M:

A binary risk factor with a value of 0 (not treated/received) and 1(treated/received).

X1:

First continuous intermediate confounder.

X2:

Second continuous intermediate confounder.

X3:

Third continuous intermediate confounder.

C:

A continuous baseline covariate.

Details

Note that all the variables are randomly generated using the simulation setting in Park, S., Kang, S., & Lee, C. (2025).

References

Park, S., Kang, S., & Lee, C. (2025). Simulation-Based Sensitivity Analysis in Optimal Treatment Regimes and Causal Decomposition with Individualized Interventions. arXiv preprint arXiv:2506.19010.


Causal Decomposition with Individualized Interventions

Description

‘ind.decomp’ estimates how much initial disparities would persist under two scenarios: (i) Everyone follows the optimal treatment regime (OTR), yielding the Individualized Controlled Direct Effect (ICDE); (ii) The proportion of individuals following the OTR is equalized across groups, yielding the Individualized Interventional Effect (IIE). This function implements the method proposed by Park, Kang, and Lee (2025+).

Usage

ind.decomp(outcome, group, group.ref, risk.factor, intermediates, moderators,
  covariates, data, B, cluster = NULL)

Arguments

outcome

a character string indicating the name of the outcome.

group

a character string indicating the name of the social group indicator such as race or gender.

group.ref

reference group of the social group indicator.

risk.factor

a character string indicating the name of the risk factor.

intermediates

vector containing the name of intermediate confounders.

moderators

a character string indicating the name of variables that have heterogeneous effects on the outcome based on the risk factor.

covariates

a vector containing the name of baseline covariate variable(s).

data

The data set for analysis (data.frame).

B

Number of bootstrapped samples in the causal decomposition analysis.

cluster

a vector of cluster indicators for the bootstrap. If provided, the cluster bootstrap is used. Default is 'NULL'.

Details

This function first estimates the initial disparity, defined as the average difference in an outcome between groups within the same levels of outcome-allowable covariates. Formally,

\tau_{c} \equiv E[Y \mid R = 1, c] - E[Y \mid R = 0, c], \quad \text{for } c \in \mathcal{C}

where R = 1 is the comparison group and R = 0 is the reference group. The term c \in \mathcal{C} represents baseline covariates.

For individualized conditional effects, disparity remaining is defined as

\zeta_{c}^{\mathrm{ICDE}}(d^{opt}) \equiv E[Y(d^{opt}) \mid R = 1, c] - E[Y(d^{opt}) \mid R = 0, c], \quad \text{for } c \in \mathcal{C}

where d^{opt} is an optimal value for risk factor M. This definition of disparity remaining states the difference in an outcome between the comparison and reference groups after setting their risk factor M to the optimal value obtained from the OTRs.

For individualized interventional effects, disparity reduction and disparity remaining due to equalizing compliance rates across groups can be expressed as follows:

\delta_{c}^{\mathrm{IIE}}(1) \equiv E[Y \mid R = 1, c] - E[Y(K) \mid R = 1, c]

\zeta_{c}^{\mathrm{IIE}}(0) \equiv E[Y(K) \mid R = 1, c] - E[Y \mid R = 0, c]

where Y(K) represents a potential outcome under the value of M that is determined by a random draw from the compliance distribution of the reference group among individuals with the same levels of target-factor-allowable covariates. Disparity reduction (\delta_{c}^{\mathrm{IIE}}(1)) represents the disparity in outcomes among a comparison group after intervening to setting the compliance rate equal to that of a reference group within the same target-factor-allowable covariate levels. Disparity remaining (\zeta_{c}^{\mathrm{IIE}}(0)) quantifies the outcome difference that persists between a comparison and reference group even after the intervention.

Value

a matrix containing the point estimates for:

  1. the initial disparity and the proportion recommended for treatment,

  2. the disparity remaining and percent reduction based on individualized conditional effects,

  3. the disparity remaining, disparity reduction, and percent reduction based on individualized intervention effects.

It also returns the nonparametric bootstrap confidence intervals for each estimate.

Author(s)

Soojin Park, University of California, Riverside, soojinp@ucr.edu; Suyeon Kang, University of Central Florida, suyeon.kang@ucf.edu; Karen Xu, University of California, Riverside, karenxu@ucr.edu.

References

Park, S., Kang, S., & Lee, C. (2025). Simulation-Based Sensitivity Analysis in Optimal Treatment Regimes and Causal Decomposition with Individualized Interventions. arXiv preprint arXiv:2506.19010.

See Also

ind.sens

Examples

data(idata)

# NOTE: We recommend using at least B = 500 boostrap replications.
results_unadj <- ind.decomp(outcome = "Y", group = "R", group.ref = "0", risk.factor = "M",
                           intermediates = c("X1", "X2", "X3"), moderators = c("X1", "X2"),
                           covariates = "C", data = idata, B = 50)
results_unadj                            

Sensitivity Analysis for Causal Decomposition with Individualized Interventions

Description

‘ind.sens’ is used to assess the sensitivity of estimates from the individualized causal decomposition to potential omitted confoudners between the risk factor M and the outcome Y, using a benchmarking approach based on the relative strength of an unmeasured confounder compared to a specified covariate. The function employs a stochastic EM algorithm to simulate the distribution of the unobserved confounder and generate bias-adjusted estimates.

Usage

ind.sens(k.y, k.m, Iternum, outcome, group, group.ref, intermediates, moderators,
  benchmark, risk.factor, covariates, data, B, cluster = NULL, nsim, mc.cores)

Arguments

k.y

scales the association of the unmeasured confounder with the outcome (numeric). A value of 1 means a one-unit change in the unmeasured confounder shifts the mean of the outcome by the same amount as a one-unit change in the benchmark covariate—conditional on the social-group indicator, intermediate confounder(s), and baseline covariate(s). Default is 1.

k.m

scales the association of the unmeasured confounder with the log-odds of risk.factor = 1 (numeric). A value of 1 means a one-unit change in the unmeasured confounder changes the log-odds of the risk factor by the same amount as the benchmark covariate, conditional on the social-group indicator, intermediate confounder(s), and baseline covariate(s). Default is 1.

Iternum

Number of iterations in the stochastic EM algorithm for generating the unmeasured confounder. default is 10.

outcome

a character string indicating the name of the outcome.

group

a character string indicating the name of the social group indicator such as race or gender.

group.ref

reference group of the social group indicator.

intermediates

vector containing the name of intermediate confounders.

moderators

a character string indicating the name of variables that have heterogeneous effects on the outcome based on the risk factor.

benchmark

a vector containing the name of the benchmark covariate used to scale k.y and k.m.

risk.factor

a character string indicating the name of the risk factor.

covariates

a vector containing the name of baseline covariate variable(s).

data

The data set for analysis (data.frame).

B

Number of bootstrapped samples in the causal decomposition analysis.

cluster

a vector of cluster indicators for the bootstrap. If provided, the cluster bootstrap is used. Default is 'NULL'.

nsim

Number of random draws of the unmeasured confounder per observation. default is 15. Increase the number for more smooth curves.

mc.cores

Number of cores to use. Must be exactly 1 on Windows. Default is 1.

Details

This function returns the adjusted point estimates and confidence intervals after accounting for a simulated unmeasured confounder U. We employ an approach that compares the coefficient of the unmeasured confounder U with that of an observed covariate X_j, after controlling for the remaining observed covariates (R, X_{-j}, and C).

Specifically, k_y indicates the extent to which the outcome Y is associated with a one unit increase in U relative to how much it is associated with a one unit change in X_j, after controlling for R, X_{-j}, and C. Likewise, k_m indicates the extent to which the odds of M = 1 are explained by unmeasured confounder U relative to how much they are explained by X_j, after controlling for R, X_{-j}, and C. These parameters (k_m and k_y) should be specified by researchers based on the assumed strength of U relative to that of the given observed covariate X_j.

Value

A matrix containing the adjusted point estimates for:

  1. the initial disparity, the proportion recommended for treatment,

  2. the disparity remaining and percent reduction based on individualized conditional effects,

  3. the disparity remaining, disparity reduction, and percent reduction based on individualized intervention effects.

It also returns the adjusted nonparametric bootstrap confidence intervals for each estimate.

Author(s)

Soojin Park, University of California, Riverside, soojinp@ucr.edu; Suyeon Kang, University of Central Florida, suyeon.kang@ucf.edu; Karen Xu, University of California, Riverside, karenxu@ucr.edu.

References

Park, S., Kang, S., & Lee, C. (2025). Simulation-Based Sensitivity Analysis in Optimal Treatment Regimes and Causal Decomposition with Individualized Interventions. arXiv preprint arXiv:2506.19010.

See Also

ind.decomp

Examples

data(idata)

results_adj <- ind.sens(k.y = 1, k.m = 1, Iternum = 5, outcome = "Y", group = "R", group.ref = "0",
                       intermediates = c("X1", "X2", "X3"), moderators = c("X1", "X2"),
                       benchmark = "X1", risk.factor = "M", covariates = "C", data = idata,
                       B = 50, cluster = NULL, nsim = 10, mc.cores = 1)
results_adj

Multiple-Mediator-Imputation Estimation Method

Description

'mmi' is used to estimate the initial disparity, disparity reduction, and disparity remaining for causal decomposition analysis, using the multiple-mediator-imputation estimation method proposed by Park et al. (2020). This estimator was originally developed to handle multiple mediators simultaneously; however, it can also be applied to a single mediator.

Usage

mmi(fit.r = NULL, fit.x, fit.y, group, covariates, sims = 100, conf.level = .95,
    conditional = TRUE, cluster = NULL, long = TRUE, mc.cores = 1L, seed = NULL)

Arguments

fit.r

a fitted model object for social group indicator (treatment). Can be of class 'CBPS' or 'SumStat'. Default is 'NULL'. Only necessary if 'conditional' is 'FALSE'.

fit.x

a fitted model object for intermediate confounder(s). Each intermediate model can be of class 'lm', 'glm', 'multinom', or 'polr'. When multiple confounders are considered, can be of class 'list' containing multiple models.

fit.y

a fitted model object for outcome. Can be of class 'lm' or 'glm'.

group

a character string indicating the name of the social group indicator such as race or gender (treatment) used in the models. The social group indicator can be categorical with two or more categories (two- or multi-valued factor).

covariates

a vector containing the name of the covariate variable(s) used in the models. Each covariate can be categorical with two or more categories (two- or multi-valued factor) or continuous (numeric).

sims

number of Monte Carlo draws for nonparametric bootstrap.

conf.level

level of the returned two-sided confidence intervals, which are estimated by the nonparametric percentile bootstrap method. Default is .95, which returns the 2.5 and 97.5 percentiles of the simulated quantities.

conditional

a logical value. If 'TRUE', the function will return the estimates conditional on those covariate values, and all covariates in mediator and outcome models need to be centered prior to fitting. Default is 'TRUE'. If 'FALSE', 'fit.r' needs to be specified.

cluster

a vector of cluster indicators for the bootstrap. If provided, the cluster bootstrap is used. Default is 'NULL'.

long

a logical value. If 'TRUE', the output will contain the entire sets of estimates for all bootstrap samples. Default is 'TRUE'.

mc.cores

The number of cores to use. Must be exactly 1 on Windows.

seed

seed number for the reproducibility of results. Default is ‘NULL’.

Details

This function returns the point estimates of the initial disparity, disparity reduction, and disparity remaining for a categorical social group indicator and a variety of types of outcome and mediator(s) in causal decomposition analysis. It also returns nonparametric percentile bootstrap confidence intervals for each estimate.

The initial disparity between two groups is defined as \tau(r,0) \equiv E[Y|R=r,c]-E[Y|R=0,c], for c \in \mathcal{C} and r \mathcal{R}, where R=0 denotes the reference group and R=r is the comparison group.

The disparity reduction, conditional on baseline covariates, measures how much the initial disparity would shrink if we simultaneously equalized the distribution of the mediators W and M across groups within each level of C. Formally, \delta(1)\equiv E[Y|R=1,c]-E[Y(G_{w|c}(0)G_{m|c}(0))|R=1,c], where G_{m|c}(0) and G_{w|c}(0) are a random draw from the reference group’s mediator M and W distribution given C, respectively.

The remaining disparity, also conditional on C, is the difference between the reference group’s observed outcome and the counterfactual outcome for the comparison group after equalizing M. Formally, \zeta(0) \equiv E[Y(Y(G_{w|c}(0)G_{m|c}(0)))|R=1, c]-E[Y|R=0, c].

The disparity reduction and remaining can be estimated using the multiple-mediator-imputation method suggested by Park et al. (2020). See the references for more details.

If one wants to make the inference conditional on baseline covariates, set 'conditional = TRUE' and center the data before fitting the models.

As of version 0.1.0, the intetmediate confounder model ('fit.x') can be of class 'lm', 'glm', 'multinom', or 'polr', corresponding respectively to the linear regression models and generalized linear models, multinomial log-linear models, and ordered response models. The outcome model ('fit.y') can be of class 'lm' or 'glm'. Also, the social group model ('fit.r') can be of class 'CBPS' or 'SumStat', both of which use the propensity score weighting. It is only necessary when 'conditional = FALSE'.

Value

result

a matrix containing the point estimates of the initial disparity, disparity remaining, and disparity reduction, and the percentile bootstrap confidence intervals for each estimate.

all.result

a matrix containing the point estimates of the initial disparity, disparity remaining, and disparity reduction for all bootstrap samples. Returned if 'long' is 'TRUE'.

Author(s)

Suyeon Kang, University of Central Florida, suyeon.kang@ucf.edu; Soojin Park, University of California, Riverside, soojinp@ucr.edu.

References

Park, S., Qin, X., & Lee, C. (2022). "Estimation and sensitivity analysis for causal decomposition in health disparity research". Sociological Methods & Research, 53(2), 571-602.

Park, S., Kang, S., & Lee, C. (2023). "Choosing an Optimal Method for Causal Decomposition Analysis with Continuous Outcomes: A Review and Simulation Study". Sociological Methodology, 54(1), 92-117.

See Also

smi

Examples

data(sdata)

#------------------------------------------------------------------------------#
# Example 1-a: Continuous Outcome
#------------------------------------------------------------------------------#
fit.m1 <- lm(M.num ~ R + C.num + C.bin, data = sdata)
fit.m2 <- glm(M.bin ~ R + C.num + C.bin, data = sdata,
          family = binomial(link = "logit"))
require(MASS)
fit.m3 <- polr(M.cat ~ R + C.num + C.bin, data = sdata)
fit.x1 <- lm(X ~ R + C.num + C.bin, data = sdata)
require(nnet)
fit.m4 <- multinom(M.cat ~ R + C.num + C.bin, data = sdata)
fit.y1 <- lm(Y.num ~ R + M.num + M.bin + M.cat + X + C.num + C.bin,
          data = sdata)

require(PSweight)
fit.r1 <- SumStat(R ~ C.num + C.bin, data = sdata, weight = "IPW")
require(CBPS)
fit.r2 <- CBPS(R ~ C.num + C.bin, data = sdata, method = "exact",
          standardize = "TRUE")

res.1a <- mmi(fit.r = fit.r1, fit.x = fit.x1,
          fit.y = fit.y1, sims = 40, conditional = FALSE,
          covariates = c("C.num", "C.bin"), group = "R", seed = 111)
res.1a

#------------------------------------------------------------------------------#
# Example 1-b: Binary Outcome
#------------------------------------------------------------------------------#
fit.y2 <- glm(Y.bin ~ R + M.num + M.bin + M.cat + X + C.num + C.bin,
          data = sdata, family = binomial(link = "logit"))

res.1b <- mmi(fit.r = fit.r1, fit.x = fit.x1,
          fit.y = fit.y2, sims = 40, conditional = FALSE,
          covariates = c("C.num", "C.bin"), group = "R", seed = 111)
res.1b

#------------------------------------------------------------------------------#
# Example 2-a: Continuous Outcome, Conditional on Covariates
#------------------------------------------------------------------------------#
# For conditional = TRUE, need to create data with centered covariates
# copy data
sdata.c <- sdata
# center quantitative covariate(s)
sdata.c$C.num <- scale(sdata.c$C.num, center = TRUE, scale = FALSE)
# center binary (or categorical) covariates(s)
# only neccessary if the desired baseline level is NOT the default baseline level.
sdata.c$C.bin <- relevel(sdata.c$C.bin, ref = "1")

# fit mediator and outcome models
fit.m1 <- lm(M.num ~ R + C.num + C.bin, data = sdata.c)
fit.m2 <- glm(M.bin ~ R + C.num + C.bin, data = sdata.c,
          family = binomial(link = "logit"))
fit.m3 <- polr(M.cat ~ R + C.num + C.bin, data = sdata.c)
fit.x2 <- lm(X ~ R + C.num + C.bin, data = sdata.c)
fit.y1 <- lm(Y.num ~ R + M.num + M.bin + M.cat + X + C.num + C.bin,
          data = sdata.c)

res.2a <- mmi(fit.x = fit.x2,
          fit.y = fit.y1, sims = 40, conditional = TRUE,
          covariates = c("C.num", "C.bin"), group = "R", seed = 111)
res.2a

#------------------------------------------------------------------------------#
# Example 2-b: Binary Outcome, Conditional on Covariates
#------------------------------------------------------------------------------#
fit.y2 <- glm(Y.bin ~ R + M.num + M.bin + M.cat + X + C.num + C.bin,
          data = sdata.c, family = binomial(link = "logit"))

res.2b <- mmi(fit.x = fit.x2,
          fit.y = fit.y2, sims = 40, conditional = TRUE,
          covariates = c("C.num", "C.bin"), group = "R", seed = 111)
res.2b

#------------------------------------------------------------------------------#
# Example 3: Case with Multiple Intermediate Confounders
#------------------------------------------------------------------------------#
fit.r <- SumStat(R ~ C, data = idata, weight = "IPW")
fit.x1 <- lm(X1 ~ R + C, data = idata)
fit.x2 <- lm(X2 ~ R + C, data = idata)
fit.x3 <- lm(X3 ~ R + C, data = idata)
fit.y <- lm(Y ~ R + M + X1 + X2 + X3 + C, data = idata)

res.3 <- mmi(fit.r = fit.r, fit.x = list(fit.x1, fit.x2, fit.x3),
         fit.y = fit.y, sims = 40, conditional = FALSE,
         covariates = "C", group = "R", seed = 111)
         res.3

Visualize sensitivity Objects

Description

S3 methods visualizing results for some objects generated by sensitivity.

Usage

## S3 method for class 'sensitivity'
plot(x, xlim = c(0, 0.3), ylim = c(0, 0.3), ...)

Arguments

x

an object of class sensitivity.

xlim

limits of the x-axis of contour plots.

ylim

limits of the y-axis of contour plots.

...

Other arguments for future usage.


Product-of-Coefficients-Regression Estimation Method

Description

'pocr' is used to estimate the initial disparity, disparity reduction, and disparity remaining for causal decomposition analysis, using the product-of-coefficients-regression estimation method proposed by Park et al. (2021+).

Usage

pocr(fit.x = NULL, fit.m, fit.y, group, covariates, sims = 100, conf.level = .95,
     cluster = NULL, long = TRUE, mc.cores = 1L, seed = NULL)

Arguments

fit.x

a fitted model object for intermediate confounder. Can be of class 'lm'. Only necessary if the mediator is categorical. Default is 'NULL'.

fit.m

a fitted model object for mediator. Can be of class 'lm', 'glm', 'multinom', or 'polr'.

fit.y

a fitted model object for outcome. Can be of class 'lm'.

group

a character string indicating the name of the social group indicator such as race or gender (treatment) used in the models. The social group indicator can be categorical with two or more categories (two- or multi-valued factor).

covariates

a vector containing the name of the covariate variable(s) used in the models. Each covariate can be categorical with two or more categories (two- or multi-valued factor) or continuous (numeric).

sims

number of Monte Carlo draws for nonparametric bootstrap.

conf.level

level of the returned two-sided confidence intervals, which are estimated by the nonparametric percentile bootstrap method. Default is to return the 2.5 and 97.5 percentiles of the simulated quantities.

cluster

a vector of cluster indicators for the bootstrap. If provided, the cluster bootstrap is used. Default is 'NULL'.

long

a logical value. If 'TRUE', the output will contain the entire sets of estimates for all bootstrap samples. Default is 'TRUE'.

mc.cores

The number of cores to use. Must be exactly 1 on Windows.

seed

seed number for the reproducibility of results. Default is ‘NULL’.

Details

This function returns the point estimates of the initial disparity, disparity reduction, and disparity remaining for a categorical social group indicator (treatment) and a variety of types of outcome and mediator(s) in causal decomposition analysis. It also returns nonparametric percentile bootstrap confidence intervals for each estimate.

The definition of the initial disparity, disparity reduction, and disparity remaining can be found in help('mmi'). As opposed to the 'mmi' and 'smi' function, this function uses the product-of-coefficients-regression method suggested by Park et al. (2021+). It always make the inference conditional on baseline covariates. Therefore, users need to center the data before fitting the models. See the reference for more details.

As of version 0.1.0, the mediator model ('fit.m') can be of class 'lm', 'glm', 'multinom', or 'polr', corresponding respectively to the linear regression models and generalized linear models, multinomial log-linear models, and ordered response models. The outcome model ('fit.y') can be of class 'lm'. The intermediate confounder model ('fit.x') can also be of class 'lm' and only necessary when the mediator is categorical.

Value

result

a matrix containing the point estimates of the initial disparity, disparity remaining, and disparity reduction, and the percentile bootstrap confidence intervals for each estimate.

all.result

a matrix containing the point estimates of the initial disparity, disparity remaining, and disparity reduction for all bootstrap samples. Returned if 'long' is 'TRUE'.

Author(s)

Suyeon Kang, University of Central Florida, suyeon.kang@ucf.edu; Soojin Park, University of California, Riverside, soojinp@ucr.edu.

References

Park, S., Kang, S., and Lee, C. (2024). "Choosing an optimal method for causal decomposition analysis with continuous outcomes: A review and simulation study", Sociological methodology, 54(1), 92-117.

See Also

mmi, smi

Examples

data(sdata)

# To be conditional on covariates, first create data with centered covariates
# copy data
sdata.c <- sdata
# center quantitative covariate(s)
sdata.c$C.num <- scale(sdata.c$C.num, center = TRUE, scale = FALSE)
# center binary (or categorical) covariates(s)
# only neccessary if the desired baseline level is NOT the default baseline level.
sdata.c$C.bin <- relevel(sdata.c$C.bin, ref = "1")

#---------------------------------------------------------------------------------#
# Example 1: Continuous Mediator
#---------------------------------------------------------------------------------#
fit.m1 <- lm(M.num ~ R + C.num + C.bin, data = sdata.c)
fit.y1 <- lm(Y.num ~ R + M.num + M.num:R + X +
          C.num + C.bin, data = sdata.c)
res1 <- pocr(fit.m = fit.m1, fit.y = fit.y1, sims = 40,
        covariates = c("C.num", "C.bin"), group = "R", seed = 111)
res1

#---------------------------------------------------------------------------------#
# Example 2: Binary Mediator
#---------------------------------------------------------------------------------#
fit.x1 <- lm(X ~ R + C.num + C.bin, data = sdata.c)
fit.m2 <- glm(M.bin ~ R + C.num + C.bin, data = sdata.c,
          family = binomial(link = "logit"))
fit.y2 <- lm(Y.num ~ R + M.bin + M.bin:R + X +
          C.num + C.bin, data = sdata.c)
res2 <- pocr(fit.x = fit.x1, fit.m = fit.m2, fit.y = fit.y2,
        sims = 40, covariates = c("C.num", "C.bin"), group = "R", seed = 111)
res2

#---------------------------------------------------------------------------------#
# Example 3: Ordinal Mediator
#---------------------------------------------------------------------------------#
require(MASS)
fit.m3 <- polr(M.cat ~ R + C.num + C.bin, data = sdata.c)
fit.y3 <- lm(Y.num ~ R + M.cat + M.cat:R + X +
	C.num + C.bin, data = sdata.c)
res3 <- pocr(fit.x = fit.x1, fit.m = fit.m3, fit.y = fit.y3,
       sims = 40, covariates = c("C.num", "C.bin"), group = "R", seed = 111)
res3

#---------------------------------------------------------------------------------#
# Example 4: Nominal Mediator
#---------------------------------------------------------------------------------#
require(nnet)
fit.m4 <- multinom(M.cat ~ R + C.num + C.bin, data = sdata.c)
res4 <- pocr(fit.x = fit.x1, fit.m = fit.m4, fit.y = fit.y3,
        sims = 40, covariates = c("C.num", "C.bin"), group = "R", seed = 111)
res4

Synthetic Data Generated Based on the Midlife Development in the U.S. (MIDUS) Study

Description

This is a synthetic dataset that includes variables from the Midlife Development in the U.S. (MIDUS) study. It has been artificially generated based on the actual MIDUS data, which is not publicly available due to confidentiality concerns. The synthetic data set consists of 1948 rows and 9 columns, with no missing values.

Usage

sMIDUS

Format

A data frame containing the following variables.

health:

cardiovascular health score.

racesex:

race-gender groups with four levels (1: White men, 2: White women, 3: Black men, 4: Black women).

lowchildSES:

socioeconomic status (SES) in the childhood.

abuse:

adverse experience in the childhood.

edu:

education level.

age:

age.

stroke:

genetic vulnerability with a value of 0 or 1.

T2DM:

genetic vulnerability with a value of 0 or 1.

heart:

genetic vulnerability with a value of 0 or 1.

Details

Note that all the variables are fabricated using the actual MIDUS data used in Park et al. (2023).

References

Park, S., Kang, S., Lee, C., & Ma, S. (2023). Sensitivity analysis for causal decomposition analysis: Assessing robustness toward omitted variable bias, Journal of Causal Inference, 11(1), 20220031.


Synthetic Data for Illustration

Description

A randomly generated dataset containing 1000 rows and 9 columns with no missing values.

Usage

sdata

Format

A data frame containing the following variables. The data are provided only for explanatory purposes. The mediators are assumed to be independent of each other.

C.num:

A quantitative covariate.

C.bin:

A binary covariates with a value of 0 or 1.

R:

A group indicator with four levels.

X:

A quantitative intermediate confounder between a mediator and the outcome.

M.num:

A quantitative mediator.

M.bin:

A binary mediator with a value of 0 or 1.

M.cat:

A categorical mediator with three levels.

Y.num:

A quantitative outcome.

Y.bin:

A binary outcome with a value of 0 or 1.

Details

Note that all the variables are randomly generated using the dataset used in Park et al. (2024).

References

Park, S., Kang, S., and Lee, C. (2024). "Choosing an optimal method for causal decomposition analysis with continuous outcomes: A review and simulation study", Sociological methodology, 54(1), 92-117.


Sensitivity Analysis Using R-Squared Values for Causal Decomposition Analysis

Description

The function 'sensitivity' is used to implement sensitivity analysis for the causal decomposition analysis, using R-squared values as sensitivity parameters.

Usage

sensitivity(boot.res, fit.y, fit.m = NULL, mediator = NULL, covariates, group,
            sel.lev.group, conf.level = 0.95, max.rsq = 0.3)

Arguments

boot.res

bootstrap results from an object fitted by the 'smi' function.

fit.y

outcome model used in fitting the 'smi' function. Can be of class 'lm' or 'glm'.

fit.m

mediator model used in fitting the 'smi' function. Can be of class 'lm', 'glm', or 'polr'.

mediator

a vector containing the name of mediator used in the models.

covariates

a vector containing the name of the covariate variable(s) used in the models. Each covariate can be categorical with two or more categories (two- or multi-valued factor) or continuous (numeric).

group

a character string indicating the name of the social group indicator such as race or gender (treatment) used in the models. The social group indicator can be categorical with two or more categories (two- or multi-valued factor).

sel.lev.group

a level of categorical social group indicator (treatment) which is to be compared with the reference level.

conf.level

level of the returned two-sided confidence intervals, which are estimated by the nonparametric percentile bootstrap method. Default is .95, which returns the 2.5 and 97.5 percentiles of the simulated quantities.

max.rsq

upper limit of the two sensitivity parameters (R-squared values). Once it is set, the R-squared values between 0 and this upper limit are explored to draw the sensitivity contour plots. Default is 0.3.

Details

This function is used to implement sensitivity analysis for the causal decomposition analysis, using two sensitivity parameters: (i) the R-squared value between the outcome and unobserved confounder given the social group indicator (treatment), intermediate confounder, mediator, and covariates; and (ii) the R-squared value between the mediator and unobserved confounder given the social group indicator (treatment), intermediate confounder, and covariates (Park et al., 2023).

As of version 0.1.0, 'boot.res' must be fitted by the 'smi' function with a single mediator, which can be of class 'lm', 'glm', or 'polr'.

Value

call

original function call.

disparity.reduction

a matrix containing the estimated disparity reduction value along with lower and upper limits of the confidence interval, for each combination of the two sensitivity parameters, assuming those two are equal.

disparity.remaining

a matrix containing the estimated disparity remaining value along with lower and upper limits of the confidence interval, for each combination of the two sensitivity parameters, assuming those two are equal.

rm

R-squared values between the mediator and unobserved confounder (first sensitivity parameter), which are explored for the sensitivity analysis.

ry

R-squared values between the outcome and unobserved confounder, (second sensitivity parameter), which are explored for the sensitivity analysis.

red

a matrix containing the estimated disparity reduction values given each combination of the two sensitivity parameters.

lower_red

a matrix containing the lower limit of disparity reduction given each combination of the two sensitivity parameters.

rem

a matrix containing the estimated disparity remaining values given each combination of the two sensitivity parameters.

lower_rem

a matrix containing the lower limit of disparity remaining given each combination of the two sensitivity parameters.

RV_red

robustness value for disparity reduction, which represents the strength of association that will explain away the estimated disparity reduction.

RV_red_alpha

robustness value for disparity reduction, which represents the strength of association that will change the significance of the estimated disparity reduction at the given significance level, assuming an equal association to the mediator and the outcome.

RV_rem

robustness value for disparity remaining, which represents the strength of association that will explain away the estimated disparity remaining.

RV_rem_alpha

robustness value for disparity remaining, which represents the strength of association that will change the significance of the estimated disparity remaining at the given significance level, assuming an equal association to the mediator and the outcome.

Author(s)

Suyeon Kang, University of Central Florida, suyeon.kang@ucf.edu; Soojin Park, University of California, Riverside, soojinp@ucr.edu.

References

Park, S., Kang, S., Lee, C., & Ma, S. (2023). Sensitivity analysis for causal decomposition analysis: Assessing robustness toward omitted variable bias, Journal of Causal Inference, 11(1), 20220031.

See Also

smi

Examples

data(sMIDUS)

# Center covariates
sMIDUS$age <- scale(sMIDUS$age, center = TRUE, scale = FALSE)
sMIDUS$stroke <- scale(sMIDUS$stroke, center = TRUE, scale = FALSE)
sMIDUS$T2DM <- scale(sMIDUS$T2DM, center = TRUE, scale = FALSE)
sMIDUS$heart <- scale(sMIDUS$heart, center = TRUE, scale = FALSE)

fit.m <- lm(edu ~ racesex + age + stroke + T2DM + heart, data = sMIDUS)
fit.y <- lm(health ~ racesex + lowchildSES + abuse + edu + racesex:edu +
            age + stroke + T2DM + heart, data = sMIDUS)
smiRes <- smi(fit.m = fit.m, fit.y = fit.y, sims = 40, conf.level = .95,
          covariates = c("age", "stroke", "T2DM", "heart"), group = "racesex", seed = 227)
sensRes <- sensitivity(boot.res = smiRes, fit.m = fit.m, fit.y = fit.y, mediator = "edu",
                       covariates = c("age", "stroke", "T2DM", "heart"), group = "racesex",
                       sel.lev.group = "4", max.rsq = 0.3)

sensRes
names(sensRes) 
# Create sensitivity contour plots
plot(sensRes)

Single-Mediator-Imputation Estimation Method

Description

'smi' is used to estimate the initial disparity, disparity reduction, and disparity remaining for causal decomposition analysis, using the single-mediator-imputation estimation method described by Park, Kang, and Lee (2024).

Usage

smi(fit.r = NULL, fit.m, fit.y, group, covariates, sims = 100, conf.level = .95,
    conditional = TRUE, cluster = NULL, long = TRUE, mc.cores = 1L, seed = NULL)

Arguments

fit.r

a fitted model object for social group indicator (treatment). Can be of class 'CBPS' or 'SumStat'. Default is 'NULL'. Only necessary if 'conditional' is 'FALSE'.

fit.m

a fitted model object for mediator. Can be of class 'lm', 'glm', 'multinom', or 'polr'.

fit.y

a fitted model object for outcome. Can be of class 'lm' or 'glm'.

group

a character string indicating the name of the social group indicator such as race or gender (treatment) used in the models. The social group indicator can be categorical with two or more categories (two- or multi-valued factor).

covariates

a vector containing the name of the covariate variable(s) used in the models. Each covariate can be categorical with two or more categories (two- or multi-valued factor) or continuous (numeric).

sims

number of Monte Carlo draws for nonparametric bootstrap.

conf.level

level of the returned two-sided confidence intervals, which are estimated by the nonparametric percentile bootstrap method. Default is .95, which returns the 2.5 and 97.5 percentiles of the simulated quantities.

conditional

a logical value. If 'TRUE', the function will return the estimates conditional on those covariate values; and all covariates in mediator and outcome models need to be centered prior to fitting. Default is 'TRUE'. If 'FALSE', 'fit.r' needs to be specified.

cluster

a vector of cluster indicators for the bootstrap. If provided, the cluster bootstrap is used. Default is 'NULL'.

long

a logical value. If 'TRUE', the output will contain the entire sets of estimates for all bootstrap samples. Default is 'TRUE'.

mc.cores

The number of cores to use. Must be exactly 1 on Windows.

seed

seed number for the reproducibility of results. Default is ‘NULL’.

Details

This function returns the point estimates of the initial disparity, disparity reduction, and disparity remaining for a categorical social group indicator (treatment) and a variety of types of outcome and mediator(s) in causal decomposition analysis. It also returns nonparametric percentile bootstrap confidence intervals for each estimate.

The definition of the initial disparity, disparity reduction, and disparity remaining can be found in help('mmi'). As opposed to the 'mmi' function, this function uses the single-mediator-imputation method suggested by Park et al. (2021+). See the reference for more details.

If one wants to make the inference conditional on baseline covariates, set 'conditional = TRUE' and center the data before fitting the models.

As of version 0.1.0, the mediator model ('fit.m') can be of class 'lm', 'glm', 'multinom', or 'polr', corresponding respectively to the linear regression models and generalized linear models, multinomial log-linear models, and ordered response models. The outcome model ('fit.y') can be of class 'lm' or 'glm'. Also, the social group (treatment) model ('fit.r') can be of class 'CBPS' or 'SumStat', both of which use the propensity score weighting. It is only necessary when 'conditional = FALSE'.

Value

result

a matrix containing the point estimates of the initial disparity, disparity remaining, and disparity reduction, and the percentile bootstrap confidence intervals for each estimate.

all.result

a matrix containing the point estimates of the initial disparity, disparity remaining, and disparity reduction for all bootstrap samples. Returned if 'long' is 'TRUE'.

alpha.r

a vector containing the estimates of the regression coefficient of the social group indicator (treatment) in the mediator. Not needed unless sensitivity analysis is conducted afterwards.

se.gamma

a vector containing the estimates of standard error of the egression coefficient of the mediator in the outcome model. Not needed unless sensitivity analysis is conducted afterwards.

Author(s)

Suyeon Kang, University of Central Florida, suyeon.kang@ucf.edu; Soojin Park, University of California, Riverside, soojinp@ucr.edu.

References

Park, S., Kang, S., and Lee, C. (2024). "Choosing an optimal method for causal decomposition analysis with continuous outcomes: A review and simulation study", Sociological methodology, 54(1), 92-117.

See Also

mmi, sensitivity

Examples

data(sdata)

#------------------------------------------------------------------------------#
# Example 1-a: Continuous Outcome
#------------------------------------------------------------------------------#
require(PSweight)
fit.r1 <- SumStat(R ~ C.num + C.bin, data = sdata, weight = "IPW")
require(CBPS)
fit.r2 <- CBPS(R ~ C.num + C.bin, data = sdata, method = "exact",
          standardize = "TRUE")

# Continuous mediator
fit.m1 <- lm(M.num ~ R + C.num + C.bin, data = sdata)
fit.y1 <- lm(Y.num ~ R + M.num + X + C.num + C.bin, data = sdata)
res.1a1 <- smi(fit.r = fit.r1, fit.m = fit.m1,
          fit.y = fit.y1, sims = 40, conditional = FALSE,
          covariates = c("C.num", "C.bin"), group = "R", seed = 32)
res.1a1

# Binary mediator
fit.m2 <- glm(M.bin ~ R + C.num + C.bin, data = sdata,
          family = binomial(link = "logit"))
fit.y2 <- lm(Y.num ~ R + M.bin + X + C.num + C.bin, data = sdata)
res.1a2 <- smi(fit.r = fit.r1, fit.m = fit.m2,
          fit.y = fit.y2, sims = 40, conditional = FALSE,
          covariates = c("C.num", "C.bin"), group = "R", seed = 111)
res.1a2

# Categorical mediator
require(MASS)
fit.m3 <- polr(M.cat ~ R + C.num + C.bin, data = sdata)
fit.y3 <- lm(Y.num ~ R + M.cat + X + C.num + C.bin, data = sdata)
res.1a3 <- smi(fit.r = fit.r1, fit.m = fit.m3,
          fit.y = fit.y3, sims = 40, conditional = FALSE,
          covariates = c("C.num", "C.bin"), group = "R", seed = 111)
res.1a3

require(nnet)
fit.m4 <- multinom(M.cat ~ R + C.num + C.bin, data = sdata)
res.1a4 <- smi(fit.r = fit.r1, fit.m = fit.m4,
          fit.y = fit.y3, sims = 40, conditional = FALSE,
          covariates = c("C.num", "C.bin"), group = "R", seed = 111)
res.1a4

#------------------------------------------------------------------------------#
# Example 1-b: Binary Outcome
#------------------------------------------------------------------------------#
# Continuous mediator
fit.y1 <- glm(Y.bin ~ R + M.num + X + C.num + C.bin,
          data = sdata, family = binomial(link = "logit"))
res.1b1 <- smi(fit.r = fit.r1, fit.m = fit.m1,
          fit.y = fit.y1, sims = 40, conditional = FALSE,
          covariates = c("C.num", "C.bin"), group = "R", seed = 32)
res.1b1

# Binary mediator
fit.y2 <- glm(Y.bin ~ R + M.bin + X + C.num + C.bin,
          data = sdata, family = binomial(link = "logit"))
res.1b2 <- smi(fit.r = fit.r1, fit.m = fit.m2,
          fit.y = fit.y2, sims = 40, conditional = FALSE,
          covariates = c("C.num", "C.bin"), group = "R", seed = 111)
res.1b2

# Categorical mediator
fit.y3 <- glm(Y.bin ~ R + M.cat + X + C.num + C.bin,
          data = sdata, family = binomial(link = "logit"))
res.1b3 <- smi(fit.r = fit.r1, fit.m = fit.m3,
          fit.y = fit.y3, sims = 40, conditional = FALSE,
          covariates = c("C.num", "C.bin"), group = "R", seed = 111)
res.1b3

res.1b4 <- smi(fit.r = fit.r1, fit.m = fit.m4,
          fit.y = fit.y3, sims = 40, conditional = FALSE,
          covariates = c("C.num", "C.bin"), group = "R", seed = 111)
res.1b4

#---------------------------------------------------------------------------------#
# Example 2-a: Continuous Outcome, Conditional on Covariates
#---------------------------------------------------------------------------------#
# For conditional = T, need to create data with centered covariates
# copy data
sdata.c <- sdata
# center quantitative covariate(s)
sdata.c$C.num <- scale(sdata.c$C.num, center = TRUE, scale = FALSE)
# center binary (or categorical) covariates(s)
# only neccessary if the desired baseline level is NOT the default baseline level.
sdata.c$C.bin <- relevel(sdata.c$C.bin, ref = "1")

# Continuous mediator
fit.y1 <- lm(Y.num ~ R + M.num + X + C.num + C.bin, data = sdata.c)
fit.m1 <- lm(M.num ~ R + C.num + C.bin, data = sdata.c)
res.2a1 <- smi(fit.m = fit.m1,
          fit.y = fit.y1, sims = 40, conditional = TRUE,
          covariates = c("C.num", "C.bin"), group = "R", seed = 111)
res.2a1

# Binary mediator
fit.y2 <- lm(Y.num ~ R + M.bin + X + C.num + C.bin, data = sdata.c)
fit.m2 <- glm(M.bin ~ R + C.num + C.bin, data = sdata.c,
          family = binomial(link = "logit"))
res.2a2 <- smi(fit.m = fit.m2,
          fit.y = fit.y2, sims = 40, conditional = TRUE,
          covariates = c("C.num", "C.bin"), group = "R", seed = 111)
res.2a2

# Categorical mediator
fit.y3 <- lm(Y.num ~ R + M.cat + X + C.num + C.bin, data = sdata.c)
fit.m3 <- polr(M.cat ~ R + C.num + C.bin, data = sdata.c)
res.2a3 <- smi(fit.m = fit.m3,
          fit.y = fit.y3, sims = 40, conditional = TRUE,
          covariates = c("C.num", "C.bin"), group = "R", seed = 111)
res.2a3

fit.m4 <- multinom(M.cat ~ R + C.num + C.bin, data = sdata.c)
res.2a4 <- smi(fit.m = fit.m4,
          fit.y = fit.y3, sims = 40, conditional = TRUE,
          covariates = c("C.num", "C.bin"), group = "R", seed = 111)
res.2a4

#------------------------------------------------------------------------------#
# Example 2-b: Binary Outcome, Conditional on Covariates
#------------------------------------------------------------------------------#
# Continuous mediator
fit.y1 <- glm(Y.bin ~ R + M.num + X + C.num + C.bin,
          data = sdata.c, family = binomial(link = "logit"))
res.2b1 <- smi(fit.m = fit.m1,
          fit.y = fit.y1, sims = 40, conditional = TRUE,
          covariates = c("C.num", "C.bin"), group = "R", seed = 111)
res.2b1

# Binary mediator
fit.y2 <- glm(Y.bin ~ R + M.bin + X + C.num + C.bin,
          data = sdata.c, family = binomial(link = "logit"))
res.2b2 <- smi(fit.m = fit.m2,
          fit.y = fit.y2, sims = 40, conditional = TRUE,
          covariates = c("C.num", "C.bin"), group = "R", seed = 111)
res.2b2

# Categorical mediator
fit.y3 <- glm(Y.bin ~ R + M.cat + X + C.num + C.bin,
          data = sdata.c, family = binomial(link = "logit"))
res.2b3 <- smi(fit.m = fit.m3,
          fit.y = fit.y3, sims = 40, conditional = TRUE,
          covariates = c("C.num", "C.bin"), group = "R", seed = 111)
res.2b3

res.2b4 <- smi(fit.m = fit.m4,
          fit.y = fit.y3, sims = 40, conditional = TRUE,
          covariates = c("C.num", "C.bin"), group = "R", seed = 111)
res.2b4