| Encoding: | UTF-8 |
| Version: | 0.5-8 |
| Title: | Useful Tools for Structural Equation Modeling |
| Description: | Provides miscellaneous tools for structural equation modeling, many of which extend the 'lavaan' package. For example, latent interactions can be estimated using product indicators (Lin et al., 2010, <doi:10.1080/10705511.2010.488999>) and simple effects probed; analytical power analyses can be conducted (Jak et al., 2021, <doi:10.3758/s13428-020-01479-0>); and scale reliability can be estimated based on estimated factor-model parameters. |
| Depends: | R(≥ 4.0), lavaan(≥ 0.6-21), methods |
| Imports: | graphics, pbivnorm, stats, utils |
| Suggests: | blavaan, emmeans, lavaan.mi, MASS, mice, mnormt, parallel, restriktor, testthat |
| License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
| LazyData: | yes |
| LazyLoad: | yes |
| URL: | https://github.com/simsem/semTools/wiki |
| BugReports: | https://github.com/simsem/semTools/issues |
| Date: | 2026-02-14 |
| RoxygenNote: | 7.3.3 |
| NeedsCompilation: | no |
| Packaged: | 2026-02-14 06:36:27 UTC; terrence |
| Author: | Terrence D. Jorgensen
|
| Maintainer: | Terrence D. Jorgensen <TJorgensen314@gmail.com> |
| Repository: | CRAN |
| Date/Publication: | 2026-02-14 07:40:02 UTC |
semTools: Useful Tools for Structural Equation Modeling
Description
The semTools package provides many miscellaneous functions that are useful for statistical analysis involving SEM in R. Many functions extend the funtionality of the lavaan package. Some sets of functions in semTools correspond to the same theme. We call such a collection of functions a suite. Our suites include:
Model Fit Evaluation:
moreFitIndices(),nullRMSEA(),singleParamTest(),epcEquivFit(), andchisqSmallN()Measurement Invariance:
measEq.syntax(),partialInvariance(),partialInvarianceCat(), andpermuteMeasEq()Power Analysis:
SSpower(),findRMSEApower(),plotRMSEApower(),plotRMSEAdist(),findRMSEAsamplesize(),findRMSEApowernested(),plotRMSEApowernested(), andfindRMSEAsamplesizenested()Missing Data Analysis:
auxiliary(),twostage(),fmi(),bsBootMiss(),quark(), andcombinequark()Latent Interactions:
indProd(),orthogonalize(),probe2WayMC(),probe3WayMC(),probe2WayRC(),probe3WayRC(), andplotProbe()Exploratory Factor Analysis (EFA):
efa.ekc()Reliability Estimation:
compRelSEM()andmaximalRelia()(see alsoAVE())Parceling:
parcelAllocation(),PAVranking(), andpoolMAlloc()Non-Normality:
skew(),kurtosis(),mardiaSkew(),mardiaKurtosis(), andmvrnonnorm()
All users of R (or SEM) are invited to submit functions or ideas for
functions by contacting the maintainer, Terrence Jorgensen
(TJorgensen314@gmail.com). Contributors are encouraged to use
Roxygen comments to document their contributed code, which is
consistent with the rest of semTools. Read the vignette from the
roxygen2 package for details:
vignette("rd", package = "roxygen2")
Calculate average variance extracted
Description
Calculate average variance extracted (AVE) per factor from lavaan object
Usage
AVE(object, obs.var = TRUE, omit.imps = c("no.conv", "no.se"),
omit.factors = character(0), dropSingle = TRUE, return.df = TRUE)
Arguments
object |
A lavaan::lavaan or lavaan.mi::lavaan.mi object,
expected to contain only exogenous common factors (i.e., a CFA model).
Cross-loadings are not allowed and will result in |
obs.var |
|
omit.imps |
|
omit.factors |
|
dropSingle |
|
return.df |
|
Details
The average variance extracted (AVE) can be calculated by
AVE = \frac{\bold{1}^\prime
\textrm{diag}\left(\Lambda\Psi\Lambda^\prime\right)\bold{1}}{\bold{1}^\prime
\textrm{diag}\left(\hat{\Sigma}\right) \bold{1}},
Note that this formula is modified from Fornell & Larcker (1981) in the case that factor variances are not 1. The proposed formula from Fornell & Larcker (1981) assumes that the factor variances are 1. Note that AVE will not be provided for factors consisting of items with dual loadings. AVE is the property of items but not the property of factors. AVE is calculated with polychoric correlations when ordinal indicators are used.
Value
numeric vector of average variance extracted from indicators
per factor. For models with multiple "blocks" (any combination of groups
and levels), vectors may be returned as columns in a data.frame
with additional columns indicating the group/level (see return.df=
argument description for caveat).
Author(s)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
References
Fornell, C., & Larcker, D. F. (1981). Evaluating structural equation models with unobservable variables and measurement errors. Journal of Marketing Research, 18(1), 39–50. doi:10.2307/3151312
See Also
compRelSEM() for composite reliability estimates
Examples
data(HolzingerSwineford1939)
HS9 <- HolzingerSwineford1939[ , c("x7","x8","x9")]
HSbinary <- as.data.frame( lapply(HS9, cut, 2, labels=FALSE) )
names(HSbinary) <- c("y7","y8","y9")
HS <- cbind(HolzingerSwineford1939, HSbinary)
HS.model <- ' visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ y7 + y8 + y9 '
fit <- cfa(HS.model, data = HS, ordered = c("y7","y8","y9"), std.lv = TRUE)
## works for factors with exclusively continuous OR categorical indicators
AVE(fit) # uses observed (or unconstrained polychoric/polyserial) by default
AVE(fit, obs.var = FALSE)
## works for multigroup models and for multilevel models (and both)
data(Demo.twolevel)
## assign clusters to arbitrary groups
Demo.twolevel$g <- ifelse(Demo.twolevel$cluster %% 2L, "type1", "type2")
model2 <- ' group: type1
level: within
fac =~ y1 + L2*y2 + L3*y3
level: between
fac =~ y1 + L2*y2 + L3*y3
group: type2
level: within
fac =~ y1 + L2*y2 + L3*y3
level: between
fac =~ y1 + L2*y2 + L3*y3
'
fit2 <- sem(model2, data = Demo.twolevel, cluster = "cluster", group = "g")
AVE(fit2)
Class For the Results of Bollen-Stine Bootstrap with Incomplete Data
Description
This class contains the results of Bollen-Stine bootstrap with missing data.
Usage
## S4 method for signature 'BootMiss'
show(object)
## S4 method for signature 'BootMiss'
summary(object)
## S4 method for signature 'BootMiss'
hist(x, ..., alpha = 0.05, nd = 2,
printLegend = TRUE, legendArgs = list(x = "topleft"))
Arguments
object, x |
object of class |
... |
Additional arguments to pass to |
alpha |
alpha level used to draw confidence limits |
nd |
number of digits to display |
printLegend |
|
legendArgs |
|
Value
The hist method returns a list of length == 2,
containing the arguments for the call to hist and the arguments
to the call for legend, respectively.
Slots
timeA list containing 2
difftimeobjects (transformandfit), indicating the time elapsed for data transformation and for fitting the model to bootstrap data sets, respectively.transDataTransformed data
bootDistThe vector of
chi^2values from bootstrap data sets fitted by the target modelorigChiThe
chi^2value from the original data setdfThe degree of freedom of the model
bootPThe p value comparing the original
chi^2with the bootstrap distribution
Objects from the Class
Objects can be created via the
bsBootMiss() function.
Author(s)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
See Also
Examples
# See the example from the bsBootMiss function
Class For Representing A Template of Model Fit Comparisons
Description
This class contains model fit measures and model fit comparisons among multiple models
Usage
## S4 method for signature 'FitDiff'
show(object)
## S4 method for signature 'FitDiff'
summary(object, fit.measures = "default", nd = 3,
tag = "†")
Arguments
object |
object of class |
fit.measures |
|
nd |
number of digits printed |
tag |
single |
Slots
namecharacter. The name of each modelmodel.classcharacter. One class to which each model belongsnesteddata.frame. Model fit comparisons between adjacently nested models that are ordered by their degrees of freedom (df)fitdata.frame. Fit measures of all models specified in thenameslot, ordered by their dffit.diffdata.frame. Sequential differences in fit measures in thefitslot
Objects from the Class
Objects can be created via the
compareFit() function.
Author(s)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
Sunthud Pornprasertmanit (psunthud@gmail.com)
See Also
Examples
HS.model <- ' visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9 '
fit.config <- cfa(HS.model, data = HolzingerSwineford1939, group = "school")
## invariance constraints
fit.metric <- cfa(HS.model, data = HolzingerSwineford1939, group = "school",
group.equal = "loadings")
fit.scalar <- cfa(HS.model, data = HolzingerSwineford1939, group = "school",
group.equal = c("loadings","intercepts"))
fit.strict <- cfa(HS.model, data = HolzingerSwineford1939, group = "school",
group.equal = c("loadings","intercepts","residuals"))
measEqOut <- compareFit(fit.config, fit.metric, fit.scalar, fit.strict)
summary(measEqOut)
summary(measEqOut, fit.measures = "all")
summary(measEqOut, fit.measures = c("aic", "bic"))
if(interactive()){
## Save results to a file
saveFile(measEqOut, file = "measEq.txt")
## Copy to a clipboard
clipboard(measEqOut)
}
Class For the Result of Nesting and Equivalence Testing
Description
This class contains the results of nesting and equivalence testing among multiple models
Usage
## S4 method for signature 'Net'
show(object)
## S4 method for signature 'Net'
summary(object)
Arguments
object |
An object of class |
Value
show |
|
summary |
|
Slots
testLogical
matrixindicating nesting/equivalence among modelsdfThe degrees of freedom of tested models
Objects from the Class
Objects can be created via the
net() function.
Author(s)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
See Also
Examples
# See the example in the net function.
Parcel-Allocation Variability in Model Ranking
Description
This function quantifies and assesses the consequences of parcel-allocation
variability for model ranking of structural equation models (SEMs) that
differ in their structural specification but share the same parcel-level
measurement specification (see Sterba & Rights, 2016). This function calls
parcelAllocation()—which can be used with only one SEM in
isolation—to fit two (assumed) nested models to each of a specified number
of random item-to-parcel allocations. Output includes summary information
about the distribution of model selection results (including plots) and the
distribution of results for each model individually, across allocations
within-sample. Note that this function can be used when selecting among more
than two competing structural models as well (see instructions below
involving the seed= argument).
Usage
PAVranking(model0, model1, data, parcel.names, item.syntax, nAlloc = 100,
fun = "sem", alpha = 0.05, bic.crit = 10, fit.measures = c("chisq",
"df", "cfi", "tli", "rmsea", "srmr", "logl", "aic", "bic", "bic2"), ...,
show.progress = FALSE, iseed = 12345, warn = FALSE)
Arguments
model0, model1 |
|
data |
A |
parcel.names |
|
item.syntax |
|
nAlloc |
The number of random items-to-parcels allocations to generate. |
fun |
|
alpha |
Alpha level used as criterion for significance. |
bic.crit |
Criterion for assessing evidence in favor of one model over another. See Rafferty (1995) for guidelines (default is "very strong evidence" in favor of the model with lower BIC). |
fit.measures |
|
... |
Additional arguments to be passed to
|
show.progress |
If |
iseed |
(Optional) Random seed used for parceling items. When the same
random seed is specified and the program is re-run, the same allocations
will be generated. The seed argument can be used to assess parcel-allocation
variability in model ranking when considering more than two models. For each
pair of models under comparison, the program should be rerun using the same
random seed. Doing so ensures that multiple model comparisons will employ
the same set of parcel datasets. Note: When using parallel
options, you must first type |
warn |
Whether to print warnings when fitting models to each allocation |
Details
This is based on a SAS macro ParcelAlloc (Sterba & MacCallum, 2010).
The PAVranking() function produces results discussed in Sterba and
Rights (2016) relevant to the assessment of parcel-allocation variability in
model selection and model ranking. Specifically, the PAVranking()
function first calls parcelAllocation() to generate a given
number (nAlloc=) of item-to-parcel allocations, fitting both specified
models to each allocation, and providing summaryies of PAV for each model.
Additionally, PAVranking() provides the following new summaries:
PAV in model selection index values and model ranking between Models
model0=andmodel1=.The proportion of allocations that converged and the proportion of proper solutions (results are summarized for allocations with both converged and proper allocations only).
For further details on the benefits of the random allocation of items to parcels, see Sterba (2011) and Sterba and MacCallum (2010).
To test whether nested models have equivalent fit, results can be pooled across allocations using the same methods available for pooling results across multiple imputations of missing data (see Examples).
Note: This function requires the lavaan package. Missing data
must be coded as NA. If the function returns "Error in plot.new() : figure margins too large", the user may need to increase
size of the plot window (e.g., in RStudio) and rerun the function.
Value
A list with 3 elements. The first two (model0.results and
model1.results) are results returned by parcelAllocation()
for model0 and model1, respectively.
The third element (model0.v.model1) is a list of
model-comparison results, including the following:
\verb{LRT_Summary:} |
The average likelihood ratio test across allocations, as well as the SD, minimum, maximum, range, and the proportion of allocations for which the test was significant. |
\verb{Fit_Index_Differences:} |
Differences in fit indices, organized by what proportion favored each model and among those, what the average difference was. |
\verb{Favored_by_BIC:} |
The proportion of allocations in which each
model met the criterion ( |
\verb{Convergence_Summary:} |
The proportion of allocations in which each model (and both models) converged on a solution. |
Histograms are also printed to the current plot-output device.
Author(s)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
References
Raftery, A. E. (1995). Bayesian model selection in social research. Sociological Methodology, 25, 111–163. doi:10.2307/271063
Sterba, S. K. (2011). Implications of parcel-allocation variability for comparing fit of item-solutions and parcel-solutions. Structural Equation Modeling, 18(4), 554–577.doi:10.1080/10705511.2011.607073
Sterba, S. K., & MacCallum, R. C. (2010). Variability in parameter estimates and model fit across repeated allocations of items to parcels. Multivariate Behavioral Research, 45(2), 322–358. doi:10.1080/00273171003680302
Sterba, S. K., & Rights, J. D. (2016). Accounting for parcel-allocation variability in practice: Combining sources of uncertainty and choosing the number of allocations. Multivariate Behavioral Research, 51(2–3), 296–313. doi:10.1080/00273171.2016.1144502
Sterba, S. K., & Rights, J. D. (2017). Effects of parceling on model selection: Parcel-allocation variability in model ranking. Psychological Methods, 22(1), 47–68. doi:10.1037/met0000067
See Also
parcelAllocation() for fitting a single model,
poolMAlloc() for choosing the number of allocations
Examples
## Specify the item-level model (if NO parcels were created)
## This must apply to BOTH competing models
item.syntax <- c(paste0("f1 =~ f1item", 1:9),
paste0("f2 =~ f2item", 1:9))
cat(item.syntax, sep = "\n")
## Below, we reduce the size of this same model by
## applying different parceling schemes
## Specify a 2-factor CFA with correlated factors, using 3-indicator parcels
mod1 <- '
f1 =~ par1 + par2 + par3
f2 =~ par4 + par5 + par6
'
## Specify a more restricted model with orthogonal factors
mod0 <- '
f1 =~ par1 + par2 + par3
f2 =~ par4 + par5 + par6
f1 ~~ 0*f2
'
## names of parcels (must apply to BOTH models)
(parcel.names <- paste0("par", 1:6))
## override default random-number generator to use parallel options
RNGkind("L'Ecuyer-CMRG")
PAVranking(model0 = mod0, model1 = mod1, data = simParcel, nAlloc = 100,
parcel.names = parcel.names, item.syntax = item.syntax,
# parallel = "multicore", # parallel available on Mac/Linux
std.lv = TRUE) # any addition lavaan arguments
## POOL RESULTS by treating parcel allocations as multiple imputations.
## Details provided in Sterba & Rights (2016); see ?poolMAlloc.
## save list of data sets instead of fitting model yet
dataList <- parcelAllocation(mod0, # or mod1 (either uses same allocations)
data = simParcel, nAlloc = 100,
parcel.names = parcel.names,
item.syntax = item.syntax,
do.fit = FALSE)
## now fit each model to each data set
if(requireNamespace("lavaan.mi")){
library(lavaan.mi)
fit0 <- cfa.mi(mod0, data = dataList, std.lv = TRUE)
fit1 <- cfa.mi(mod1, data = dataList, std.lv = TRUE)
anova(fit0, fit1) # Pooled test statistic comparing models.
help(package = "lavaan.mi") # Find more methods for pooling results.
}
Power for model parameters
Description
Apply Satorra & Saris (1985) method for chi-squared power analysis.
Usage
SSpower(powerModel, n, nparam, popModel, mu, Sigma, fun = "sem",
alpha = 0.05, ...)
Arguments
powerModel |
lavaan |
n |
|
nparam |
|
popModel |
lavaan |
mu |
|
Sigma |
|
fun |
character. Name of |
alpha |
Type I error rate used to set a criterion for rejecting H0. |
... |
additional arguments to pass to |
Details
Specify all non-zero parameters in a population model, either by using
lavaan syntax (popModel) or by submitting a population covariance
matrix (Sigma) and optional mean vector (mu) implied by the
population model. Then specify an analysis model that places at least
one invalid constraint (note the number in the nparam argument).
There is also a Shiny app called "power4SEM" that provides a graphical user interface for this functionality (Jak et al., in press). It can be accessed at https://sjak.shinyapps.io/power4SEM/.
Author(s)
Alexander M. Schoemann (East Carolina University; schoemanna@ecu.edu)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
References
Satorra, A., & Saris, W. E. (1985). Power of the likelihood ratio test in covariance structure analysis. Psychometrika, 50(1), 83–90. doi:10.1007/BF02294150
Jak, S., Jorgensen, T. D., Verdam, M. G., Oort, F. J., & Elffers, L. (2021). Analytical power calculations for structural equation modeling: A tutorial and Shiny app. Behavior Research Methods, 53, 1385–1406. doi:10.3758/s13428-020-01479-0
Examples
## Specify population values. Note every parameter has a fixed value.
modelP <- '
f1 =~ .7*V1 + .7*V2 + .7*V3 + .7*V4
f2 =~ .7*V5 + .7*V6 + .7*V7 + .7*V8
f1 ~~ .3*f2
f1 ~~ 1*f1
f2 ~~ 1*f2
V1 ~~ .51*V1
V2 ~~ .51*V2
V3 ~~ .51*V3
V4 ~~ .51*V4
V5 ~~ .51*V5
V6 ~~ .51*V6
V7 ~~ .51*V7
V8 ~~ .51*V8
'
## Specify analysis model. Note parameter of interest f1~~f2 is fixed to 0.
modelA <- '
f1 =~ V1 + V2 + V3 + V4
f2 =~ V5 + V6 + V7 + V8
f1 ~~ 0*f2
'
## Calculate power
SSpower(powerModel = modelA, popModel = modelP, n = 150, nparam = 1,
std.lv = TRUE)
## Get power for a range of sample sizes
Ns <- seq(100, 500, 40)
Power <- rep(NA, length(Ns))
for(i in 1:length(Ns)) {
Power[i] <- SSpower(powerModel = modelA, popModel = modelP,
n = Ns[i], nparam = 1, std.lv = TRUE)
}
plot(x = Ns, y = Power, type = "l", xlab = "Sample Size")
## Optionally specify different values for multiple populations
modelP2 <- '
f1 =~ .7*V1 + .7*V2 + .7*V3 + .7*V4
f2 =~ .7*V5 + .7*V6 + .7*V7 + .7*V8
f1 ~~ c(-.3, .3)*f2 # DIFFERENT ACROSS GROUPS
f1 ~~ 1*f1
f2 ~~ 1*f2
V1 ~~ .51*V1
V2 ~~ .51*V2
V3 ~~ .51*V3
V4 ~~ .51*V4
V5 ~~ .51*V5
V6 ~~ .51*V6
V7 ~~ .51*V7
V8 ~~ .51*V8
'
modelA2 <- '
f1 =~ V1 + V2 + V3 + V4
f2 =~ V5 + V6 + V7 + V8
f1 ~~ c(psi21, psi21)*f2 # EQUALITY CONSTRAINT ACROSS GROUPS
'
## Calculate power
SSpower(powerModel = modelA2, popModel = modelP2, n = c(100, 100), nparam = 1,
std.lv = TRUE)
## Get power for a range of sample sizes
Ns2 <- cbind(Group1 = seq(10, 100, 10), Group2 = seq(10, 100, 10))
Power2 <- apply(Ns2, MARGIN = 1, FUN = function(nn) {
SSpower(powerModel = modelA2, popModel = modelP2, n = nn,
nparam = 1, std.lv = TRUE)
})
plot(x = rowSums(Ns2), y = Power2, type = "l", xlab = "Total Sample Size",
ylim = 0:1)
abline(h = c(.8, .9), lty = c("dotted","dashed"))
legend("bottomright", c("80% Power","90% Power"), lty = c("dotted","dashed"))
Implement Saturated Correlates with FIML
Description
Automatically add auxiliary variables to a lavaan model when using full information maximum likelihood (FIML) to handle missing data
Usage
auxiliary(model, data, aux, fun, ..., envir = getNamespace("lavaan"),
return.syntax = FALSE)
lavaan.auxiliary(model, data, aux, ..., envir = getNamespace("lavaan"))
cfa.auxiliary(model, data, aux, ..., envir = getNamespace("lavaan"))
sem.auxiliary(model, data, aux, ..., envir = getNamespace("lavaan"))
growth.auxiliary(model, data, aux, ..., envir = getNamespace("lavaan"))
Arguments
model |
The analysis model can be specified with 1 of 2 objects:
|
data |
|
aux |
|
fun |
|
... |
Additional arguments to pass to |
envir |
Passed to |
return.syntax |
|
Details
These functions are wrappers around the corresponding lavaan functions.
You can use them the same way you use lavaan::lavaan(), but you
must pass your full data.frame to the data argument.
Because the saturated-correlates approaches (Enders, 2008) treats exogenous
variables as random, fixed.x must be set to FALSE. Because FIML
requires continuous data (although nonnormality corrections can still be
requested), no variables in the model nor auxiliary variables specified in
aux can be declared as ordered.
Value
a fitted lavaan::lavaan object. Additional
information is stored as a list in the @external slot:
-
baseline.model. a fitted lavaan::lavaan object. Results of fitting an appropriate independence model for the calculation of incremental fit indices (e.g., CFI, TLI) in which the auxiliary variables remain saturated, so only the target variables are constrained to be orthogonal. See Examples for how to send this baseline model tolavaan::fitMeasures(). -
aux. The character vector of auxiliary variable names. -
baseline.syntax. A character vector generated within theauxiliaryfunction, specifying thebaseline.modelsyntax.
Author(s)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
References
Enders, C. K. (2008). A note on the use of missing auxiliary variables in full information maximum likelihood-based structural equation models. Structural Equation Modeling, 15(3), 434–448. doi:10.1080/10705510802154307
Examples
dat1 <- lavaan::HolzingerSwineford1939
set.seed(12345)
dat1$z <- rnorm(nrow(dat1))
dat1$x5 <- ifelse(dat1$z < quantile(dat1$z, .3), NA, dat1$x5)
dat1$x9 <- ifelse(dat1$z > quantile(dat1$z, .8), NA, dat1$x9)
targetModel <- "
visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9
"
## works just like cfa(), but with an extra "aux" argument
fitaux1 <- cfa.auxiliary(targetModel, data = dat1, aux = "z",
missing = "fiml", estimator = "mlr")
## with multiple auxiliary variables and multiple groups
fitaux2 <- cfa.auxiliary(targetModel, data = dat1, aux = c("z","ageyr","grade"),
group = "school", group.equal = "loadings")
## calculate correct incremental fit indices (e.g., CFI, TLI)
fitMeasures(fitaux2, fit.measures = c("cfi","tli"))
## NOTE: lavaan will use the internally stored baseline model, which
## is the independence model plus saturated auxiliary parameters
lavInspect(fitaux2@external$baseline.model, "free")
Bollen-Stine Bootstrap with the Existence of Missing Data
Description
Implement the Bollen and Stine's (1992) Bootstrap when missing observations
exist. The implemented method is proposed by Savalei and Yuan (2009). This
can be used in two ways. The first and easiest option is to fit the model to
incomplete data in lavaan using the FIML estimator, then pass that
lavaan object to bsBootMiss.
Usage
bsBootMiss(x, transformation = 2, nBoot = 500, model, rawData, Sigma, Mu,
group, ChiSquared, EMcov, writeTransData = FALSE, transDataOnly = FALSE,
writeBootData = FALSE, bootSamplesOnly = FALSE, writeArgs, seed = NULL,
suppressWarn = TRUE, showProgress = TRUE, ...)
Arguments
x |
A target |
transformation |
The transformation methods in Savalei and Yuan (2009).
There are three methods in the article, but only the first two are currently
implemented here. Use |
nBoot |
The number of bootstrap samples. |
model |
Optional. The target model if |
rawData |
Optional. The target raw data set if |
Sigma |
Optional. The model-implied covariance matrix if |
Mu |
Optional. The model-implied mean vector if |
group |
Optional character string specifying the name of the grouping
variable in |
ChiSquared |
Optional. The model's |
EMcov |
Optional, if |
writeTransData |
Logical. If |
transDataOnly |
Logical. If |
writeBootData |
Logical. If |
bootSamplesOnly |
Logical. If |
writeArgs |
Optional |
seed |
The seed number used in randomly drawing bootstrap samples. |
suppressWarn |
Logical. If |
showProgress |
Logical. Indicating whether to display a progress bar while fitting models to bootstrap samples. |
... |
The additional arguments in the |
Details
The second is designed for users of other software packages (e.g., LISREL,
EQS, Amos, or Mplus). Users can import their data, \chi^2 value, and
model-implied moments from another package, and they have the option of
saving (or writing to a file) either the transformed data or bootstrapped
samples of that data, which can be analyzed in other programs. In order to
analyze the bootstrapped samples and return a p value, users of other
programs must still specify their model using lavaan syntax.
Value
As a default, this function returns a BootMiss
object containing the results of the bootstrap samples. Use show,
summary, or hist to examine the results. Optionally, the
transformed data set is returned if transDataOnly = TRUE. Optionally,
the bootstrap data sets are returned if bootSamplesOnly = TRUE.
Author(s)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
Syntax for transformations borrowed from supplementary materials in Savalei & Yuan (2009)
References
Bollen, K. A., & Stine, R. A. (1992). Bootstrapping goodness-of-fit measures in structural equation models. Sociological Methods & Research, 21(2), 205–229. doi:10.1177/0049124192021002004
Savalei, V., & Yuan, K.-H. (2009). On the model-based bootstrap with missing data: Obtaining a p-value for a test of exact fit. Multivariate Behavioral Research, 44(6), 741–763. doi:10.1080/00273170903333590
See Also
Examples
dat1 <- HolzingerSwineford1939
dat1$x5 <- ifelse(dat1$x1 <= quantile(dat1$x1, .3), NA, dat1$x5)
dat1$x9 <- ifelse(is.na(dat1$x5), NA, dat1$x9)
targetModel <- "
visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9
"
targetFit <- sem(targetModel, dat1, meanstructure = TRUE, std.lv = TRUE,
missing = "fiml", group = "school")
summary(targetFit, fit = TRUE, standardized = TRUE)
## The number of bootstrap samples should be much higher than this example
temp <- bsBootMiss(targetFit, transformation = 1, nBoot = 10, seed = 31415)
temp
summary(temp)
hist(temp)
hist(temp, printLegend = FALSE) # suppress the legend
## user can specify alpha level (default: alpha = 0.05), and the number of
## digits to display (default: nd = 2). Pass other arguments to hist(...),
## or a list of arguments to legend() via "legendArgs"
hist(temp, alpha = .01, nd = 3, xlab = "something else", breaks = 25,
legendArgs = list("bottomleft", box.lty = 2))
Small-N correction for chi^2 test statistic
Description
Calculate small-N corrections for chi^2 model-fit test
statistic to adjust for small sample size (relative to model size).
Usage
chisqSmallN(fit0, fit1 = NULL, smallN.method = if (is.null(fit1))
c("swain", "yuan.2015") else "yuan.2005", ..., omit.imps = c("no.conv",
"no.se"))
Arguments
fit0, fit1 |
lavaan::lavaan or lavaan.mi::lavaan.mi object(s) |
smallN.method |
|
... |
Additional arguments to the |
omit.imps |
|
Details
Four finite-sample adjustments to the chi-squared statistic are currently available, all of which are described in Shi et al. (2018). These all assume normally distributed data, and may not work well with severely nonnormal data. Deng et al. (2018, section 4) review proposed small-N adjustments that do not assume normality, which rarely show promise, so they are not implemented here. This function currently will apply small-N adjustments to scaled test statistics with a warning that they do not perform well (Deng et al., 2018).
Value
A list of numeric vectors: one for the originally
requested statistic(s), along with one per requested smallN.method.
All include the the (un)adjusted test statistic, its df, and the
p value for the test under the null hypothesis that the model fits
perfectly (or that the 2 models have equivalent fit).
The adjusted chi-squared statistic(s) also include(s) the scaling factor
for the small-N adjustment.
Author(s)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
References
Deng, L., Yang, M., & Marcoulides, K. M. (2018). Structural equation modeling with many variables: A systematic review of issues and developments. Frontiers in Psychology, 9, 580. doi:10.3389/fpsyg.2018.00580
Shi, D., Lee, T., & Terry, R. A. (2018). Revisiting the model size effect in structural equation modeling. Structural Equation Modeling, 25(1), 21–40. doi:10.1080/10705511.2017.1369088
Examples
HS.model <- '
visual =~ x1 + b1*x2 + x3
textual =~ x4 + b2*x5 + x6
speed =~ x7 + b3*x8 + x9
'
fit1 <- cfa(HS.model, data = HolzingerSwineford1939[101:150,])
## test a single model (implicitly compared to a saturated model)
chisqSmallN(fit1)
## fit a more constrained model
fit0 <- cfa(HS.model, data = HolzingerSwineford1939[101:150,],
orthogonal = TRUE)
## compare 2 models
chisqSmallN(fit1, fit0)
Copy or save the result of lavaan or FitDiff objects into a
clipboard or a file
Description
Copy or save the result of lavaan or FitDiff
object into a clipboard or a file. From the clipboard, users may paste the
result into the Microsoft Excel or spreadsheet application to create a table
of the output.
Usage
clipboard(object, what = "summary", ...)
saveFile(object, file, what = "summary", tableFormat = FALSE,
fit.measures = "default", writeArgs = list(), ...)
Arguments
object |
An object of class lavaan::lavaan or FitDiff. |
what |
The attributes of the |
... |
Additional arguments when passing a |
file |
A file name used for saving the result. |
tableFormat |
If |
fit.measures |
|
writeArgs |
|
Value
The resulting output will be saved into a clipboard or a file. If
using the clipboard function, users may paste it in the other
applications.
Author(s)
Sunthud Pornprasertmanit (psunthud@gmail.com)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
Examples
library(lavaan)
HW.model <- ' visual =~ x1 + c1*x2 + x3
textual =~ x4 + c1*x5 + x6
speed =~ x7 + x8 + x9 '
fit <- cfa(HW.model, data = HolzingerSwineford1939, group = "school")
if(interactive()){
# Copy the summary of the lavaan object
clipboard(fit)
# pass additional arguments to summary() method for class?lavaan
clipboard(fit, rsquare = TRUE, standardized = TRUE, fit.measures = TRUE)
# Copy the EPC equivalence testing results from the epcEquivFit() function
clipboard(fit, "epceqfit")
# Copy the parameter estimates
clipboard(fit, "coef")
# Copy the standard errors
clipboard(fit, "se")
# Copy the sample statistics
clipboard(fit, "samp")
# Copy the fit measures
clipboard(fit, "fit")
# Save the summary of the lavaan object
saveFile(fit, "out.txt")
# Save the EPC equivalence testing results from the epcEquivFit() function
saveFile(fit, "out.txt", "epceqfit")
# Save the parameter estimates
saveFile(fit, "out.txt", "coef")
# Save the standard errors
saveFile(fit, "out.txt", "se")
# Save the sample statistics
saveFile(fit, "out.txt", "samp")
# Save the fit measures
saveFile(fit, "out.txt", "fit")
}
Combine the results from the quark function
Description
This function builds upon the quark() function to provide a
final dataset comprised of the original dataset provided to
quark() and enough principal components to be able to account
for a certain level of variance in the data.
Usage
combinequark(quark, percent)
Arguments
quark |
Provide the |
percent |
Provide a percentage of variance that you would like to have explained. That many components (columns) will be extracted and kept with the output dataset. Enter this variable as a number WITHOUT a percentage sign. |
Value
The output of this function is the original dataset used in quark combined with enough principal component scores to be able to account for the amount of variance that was requested.
Author(s)
Steven R. Chesnut (University of Southern Mississippi Steven.Chesnut@usm.edu)
See Also
Examples
set.seed(123321)
dat <- HolzingerSwineford1939[,7:15]
misspat <- matrix(runif(nrow(dat) * 9) < 0.3, nrow(dat))
dat[misspat] <- NA
dat <- cbind(HolzingerSwineford1939[,1:3], dat)
quark.list <- quark(data = dat, id = c(1, 2))
final.data <- combinequark(quark = quark.list, percent = 80)
Composite Reliability using SEM
Description
Calculate composite reliability from estimated factor-model parameters
Usage
compRelSEM(object, W = NULL, return.total = FALSE, obs.var = TRUE,
tau.eq = FALSE, ord.scale = TRUE, shared = character(0),
config = character(0), add.IRR = FALSE, higher = character(0),
true = list(), dropSingle = TRUE, omit.factors = character(0),
omit.indicators = character(0), omit.imps = c("no.conv", "no.se"),
simplify = FALSE, return.df = simplify)
Arguments
object |
A lavaan::lavaan or lavaan.mi::lavaan.mi object, expected to contain only exogenous common factors (i.e., a CFA model). |
W |
Composite weights applied to observed variables prior to summing.
By default ( |
return.total |
For multidimensional CFAs, this |
obs.var |
|
tau.eq |
|
ord.scale |
|
shared |
|
config |
Deprecated |
add.IRR |
|
higher |
Deprecated, supplanted by using the |
true |
Optional |
dropSingle |
When |
omit.factors |
Deprecated, supplanted by using the |
omit.indicators |
Deprecated, supplanted by using the |
omit.imps |
|
simplify |
|
return.df |
Deprecated |
Details
Several coefficients for factor-analysis reliability have been termed
"omega", which Cho (2021) argues is a misleading misnomer and argues for
using \rho to represent them all, differentiated by descriptive
subscripts. In our package, we strive to provide unlabeled coefficients,
leaving it to the user to decide on a label in their report. But we do
use the symbols \alpha and \omega in the formulas below in order
to distinguish coefficients that do (not) assume essential tau-equivalence.
Bentler (1968) first introduced factor-analysis reliability for a
unidimensional factor model with congeneric indicators, labeling the
coefficients \alpha. McDonald (1999) later referred to this
and other reliability coefficients, first as \theta (in 1970),
then as \omega, which is a source of confusion when reporting
coefficients (Cho, 2021). Coefficients based on factor models were later
generalized to account for multidimenisionality (possibly with
cross-loadings) and correlated errors. The general \omega formula
implemented in this function is:
\omega=\frac{\bold{w}^{\prime} \Lambda \Phi \Lambda^{\prime} \bold{w}
}{ \bold{w}^{\prime} \hat{\Sigma} \bold{w} },
where \hat{\Sigma} can be the model-implied covariance matrix from
either the saturated model (i.e., the "observed" covariance matrix, used by
default) or from the hypothesized CFA model, controlled by the obs.var=
argument. All elements of matrices in the numerator and denominator are
effectively summed by the multiplication of the outer terms \bold{w},
a k-dimensional vector of composite weights typically consisting of
\bold{1}s, unless otherwise specified with the W= argument), and
k is the number of variables in the composite. Reliability of subscale
composites (or simply for separate factors in a joint CFA) can be calculated
by setting omitted-indicator weights to 0. For unidimensional constructs
with simple structure, the equation above is often simplified to a scalar
representation (e.g., McDonald, 1999, Eq. 6.20b):
\omega = \frac{ \left( \sum^{k}_{i = 1} \lambda_i \right)^{2}
Var\left( \psi \right) }{ \left( \sum^{k}_{i = 1} \lambda_i \right)^{2}
Var\left( \psi \right) + \sum^{k}_{i = 1} \theta_{ii} },
Note that all coefficients are calculated from total factor variances:
lavInspect(object, "cov.lv"), which assumes the fitted object= is a CFA,
not a full SEM with latent regression slopes. If there is a Beta matrix, it
should only contain higher-order factor loadings (see details below).
When the fitted CFA imposes constraints consistent with (essential)
tau-equivalence, \omega is equivalent to coefficient \alpha
(Cronbach, 1951):
\alpha = \frac{k}{k - 1}\left[ 1 -
\frac{ \textrm{tr} \left( \hat{\Sigma} \right)
}{ \bold{1}^{\prime} \hat{\Sigma} \bold{1} }
\right],
where \textrm{tr} \left( . \right) is the trace operation (i.e., the
sum of diagonal elements). Setting tau.eq=TRUE triggers the application of
this formula (rather than \omega above) to the model-implied or
observed covariance matrix (again controlled by the obs.var= argument).
Higher-Order Factors:
For higher-order constructs with latent indicators, only \omega is
available because \alpha was not derived from CFA parameters (although
it can be expressed in a particular restricted CFA specification).
The reliability of a composite that represents a higher-order construct
requires partitioning the model-implied factor covariance matrix \Phi
in order to isolate the common-factor variance associated only with the
higher-order factor. Using a second-order factor model, the model-implied
covariance matrix of observed indicators \hat{\Sigma} can be
partitioned into 3 sources:
the second-order common-factor (co)variance:
\Lambda \bold{B} \Phi_2 \bold{B}^{\prime} \Lambda^{\prime}the residual variance of the first-order common factors (i.e., not accounted for by the second-order factor):
\Lambda \Psi_{u} \Lambda^{\prime}the measurement error of observed indicators:
\Theta
where \Lambda contains first-order factor loadings, \bold{B}
contains second-order factor loadings, \Phi_2 is the model-implied
covariance matrix of the second-order factor(s), and \Psi_{u} is the
covariance matrix of first-order factor disturbances. In practice, we can
use the full \bold{B} matrix and full model-implied \Phi matrix
(i.e., including all latent factors) because the zeros in \bold{B}
will cancel out unwanted components of \Phi. Thus, we can calculate
the proportion of variance of a composite score that is attributable to the
second-order factor:
\omega=\frac{\bold{w}^{\prime} \Lambda \bold{B} \Phi \bold{B}^{\prime}
\Lambda^{\prime} \bold{w} }{ \bold{w}^{\prime} \hat{\Sigma} \bold{w}},
where \bold{w}, \hat{\Sigma}, and k are defined as above.
Note that if a higher-order factor also has observed indicators, it is
necessary to model the observed indicators as single-indicator lower-order
constructs, so that all of the higher-order factor indicators are latent
(with loadings in the Beta matrix, not Lambda); otherwise, higher-order
factor variance in the observed indicator is not captured in the numerator.
Bifactor or Multitrait–Multimethod (MTMM) Models:
These multidimensional models partition sources of common variance that are
due to the factor of interest (e.g., a trait) as well as non-target factors
(e.g., "method factors", such as item wording or type of respondent).
The latter can be considered as systematic (i.e., non-random) sources of
error, to be excluded from the numerator of a reliability coefficient,
yielding so-called "hierarchical omega" (\omega_\textrm{H}). On the
other hand, non-target variance that can be expected in repeated measurement
meets the classical test theory definition of reliability. Including method
factors in the numerator yields so-called "omega total"
(\omega_\textrm{T}), which is the default approach in compRelSEM()
because it is consistent with the classical test theory definition of
reliability. However, users can obtain \omega_\textrm{H} for a
composite by using the true= argument to specify any factor(s) to be
treated as representing true scores. The same approach can be taken to
obtain the proportion of a (sub)scale composite's variance due to method
factors (by listing those in true=), if that is of interest.
Categorical Indicators:
When all indicators (per composite) are ordinal, a CFA can be fitted that includes a threshold model (sometimes called Item Factor Analysis: IFA), which assumes a normally distributed latent response underlies each observed ordinal response. Despite making this assumption, a composite of ordinal items can only be calculated by assigning numerical values to the ordinal categories, so that the pseudo-numerical variables can be summed into a composite variable that is more approximately continuous than its items.
Applying the formulas above to IFA parameters provides the
hypothetical reliability of a composite of latent responses: a composite
which cannot be calculated in practice. Nonetheless, this hypothetical
reliability can be interpreted as an estimate of what reliability could be
if a more approximately continuous response scale were used (e.g., with
sufficiently many response categories that the standardized solutions are
equivalent between a fitted IFA and a fitted CFA that treats the ordinal
responses as numeric; Chalmers, 2018). This can be requested by setting
ord.scale=FALSE, in which case \hat\Sigma in the formulas above
is a polychoric correlation matrix.
When ord.scale=FALSE and tau.eq=TRUE, this results in what Zumbo et al.
(2007) termed "ordinal \alpha" (see criticisms by Chalmers, 2018, and
and a rejoinder by Zumbo & Kroc, 2019).
Alternatively, Green and Yang (2009, Eq. 21) derived a method to calculate
model-based reliability (\omega) from IFA parameters (i.e.,
incorporating the latent-response assumption) but that applies to the actual
(i.e., ordinal) observed response scale (the default: ord.scale=TRUE).
Lu et al. (2020) showed how to incorporate unequal weights into Green and
Yang's (2009) formula, so W= can be used to estimate the (maximal)
reliability of a weighted composite of ordinal variables.
However, combining ord.scale=TRUE with tau.eq=TRUE is not available.
For \alpha to be interpretable on the observed ordinal scale,
users must choose whether to (a) release the latent-response assumption, by
fitting a CFA without a threshold model, or (b) fit an IFA model with
constraints consistent with the assumption of (essential) tau-equivalence
(i.e., equal factor loadings).
No method analogous to Green and Yang (2009, Eq. 21) has yet been proposed to calculate reliability with a mixture of categorical and continuous indicators, so any such composite is skipped with a warning.
Multilevel Measurement Models:
How to define reliability coefficients for scales employed in nested designs is an ongoing topic of methodological development, with some ongoing controversies about best practice when the target of measurement is the "cluster" or between-level (i.e., Level 2 in a 2-level design). Geldhof et al. (2014) proposed applying the standard formulas above to each level's CFA parameters and/or (model-implied) covariance matrix, whereas Lai (2021) proposed different formulas that account for all sources of variance in composites of observed variables.
There is no controversy about how to define a within-level reliability,
coefficient, which can be interpreted as the reliability of a composite
calculated by first centering each indicator around its cluster mean, then
calculating the composite from the cluster-mean-centered items. Equivalently
(i.e., the same formula), this can be interpreted as the hypothetical
reliability of a composite of the items' latent Level-1 components. This
coefficient can be requested with lavaan::model.syntax (to pass to the
W= argument) that specifies a composite in a Level-1 "block", which not
have the same name as any composite in the Level-2 block. If users do not
use W= (i.e., calculate a reliability index per modeled common factor),
then this can be accomplished by using unique factor names across levels.
This contrasts with reliability indices for between-level composites: The reliability of a hypothetical composite of items' latent between-level components (using formulas proposed by Geldhof et al., 2014) is not equivalent to the coefficient for a composite of items' observed cluster means, using generalizations of formulas proposed by Lai (2021):
\omega^\textrm{B} =
\frac{\bold{w}^{\prime} \Lambda^\textrm{B} \Phi^\textrm{B} \Lambda^{\textrm{B}\prime} \bold{w}
}{ \bold{w}^{\prime} \hat{\Sigma}^\textrm{B} \bold{w} +
\frac{1}{\tilde{n}_\textrm{clus}} \left(
\bold{w}^{\prime} \hat{\Sigma}^\textrm{W} \bold{w} \right) },
\alpha^\textrm{B} = \frac{2k}{k - 1}\left[
\frac{ \sum^{k}_{i=2} \sum^{i-1}_{j=1} \hat\sigma^\textrm{B}_{ij}
}{ \bold{1}^{\prime} \hat\Sigma^\textrm{B} \bold{1} +
\frac{1}{\tilde{n}_\textrm{clus}} \left(
\bold{1}^{\prime} \hat\Sigma^\textrm{W} \bold{1} \right) }
\right],
where \tilde{n}_\textrm{clus} is the harmonic-mean cluster size, and
superscripts B and W indicate between- and within-level parameters.
Obtaining these estimates of composite reliability requires fitting a
2-level CFA that provides the same factor structure and factor names in
the models at both levels (following the advice of Jak et al., 2021), as
well as the same composite name in both levels/blocks of syntax passed to
W= (if used). Furthermore, the between-level composite name must be
passed to the shared= argument; otherwise, the same factor/composite name
across levels will yield Lai's (2021) coefficient for a configural construct
(see Examples):
\omega^\textrm{2L} =
\frac{\bold{w}^{\prime} \left(
\Lambda^\textrm{W} \Phi^\textrm{W} \Lambda^{\textrm{W}\prime} +
\Lambda^\textrm{B} \Phi^\textrm{B} \Lambda^{\textrm{B}\prime}
\right) \bold{w}
}{ \bold{w}^{\prime} \hat\Sigma^\textrm{B} \bold{w} +
\bold{w}^{\prime} \hat\Sigma^\textrm{W} \bold{w} },
\alpha^\textrm{2L} = \frac{2k}{k - 1}\left[
\frac{ \sum^{k}_{i=2} \sum^{i-1}_{j=1} \left( \hat\sigma^\textrm{W}_{ij} +
\hat\sigma^\textrm{B}_{ij} \right)
}{ \bold{1}^{\prime} \hat\Sigma^\textrm{B} \bold{1} +
\bold{1}^{\prime} \hat\Sigma^\textrm{W} \bold{1} }
\right],
This can be interpreted as the scale-reliability coefficient ignoring the nested design, as both the common-factor variance of the Level-1 factor and of its Level-2 cluster means are treated as true-score variance.
Note that Lai's (2021) between-level reliability coefficients for a
shared construct quantify generalizability across both indicators and
raters (i.e., subjects rating their cluster's construct).
Lüdtke et al. (2011) refer to these as measurement error and sampling error,
respectively. From this perspective (and following from generalizability
theory), an IRR coefficient can also be calculated:
\textrm{IRR} =
\frac{\bold{w}^{\prime} \left( \hat{\Sigma}^\textrm{B} \right) \bold{w}
}{ \bold{w}^{\prime} \hat\Sigma^\textrm{B} \bold{w} +
\bold{w}^{\prime} \hat\Sigma^\textrm{W} \bold{w} },
which quantifies generalizability across rater/sampling-error only, and can
be returned for any shared= construct's composite by setting add.IRR=TRUE.
Value
By default (simplify=FALSE) a list of numeric vectors (1 per
composite) is returned. In multigroup CFA, the vector contains a reliability
index for each group in which the composite can be computed.
Each composite's vector has a attr(..., "header") with information to
facilitate interpretation of that index:
A list of variables in the composite, which determines the composite's total variance (denominator of reliability)
Whether that total variance (denominator) is determined from the restricted model (i.e., CFA parameters) or unrestricted model (i.e., a freely estimated covariance matrix)
Whether the variables in the composite are (a transformation of) observed variables, or whether they are latent (components of) variables. The latter (e.g., latent responses assumed to underlie observed ordinal indicators, or latent level-specific components of variables in a multilevel CFA) cannot be used to calculated an observed composite variable, so the resulting coefficient should be cautiously interpreted as a "hypothetical reliability" (Chalmers, 2018; Lai, 2021).
The latent variables that contribute common-factor variance to the composite, which determine the composite's "true-score" variance (numerator of reliability)
Which reliability formula was used: model-based reliability (so-called "omega") or coefficient alpha (a model-free lower-bound estimate of true reliability, equivalent to a model-based reliability that assumes tau-equivalence)
This header will be printed immediately above each composite's
reliability coefficient. When multiple reliability coefficients are
returned, and each vector in the list has the same length, then
setting simplify=TRUE will collect the list of single
coefficients into a vector, or the list of multiple coefficients
into a data.frame, and their headers will be concatenated to be
printed above the coefficients. Setting simplify = -1L (or any
negative number) will omit the informative headers.
Author(s)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
Uses hidden functions to implement Green & Yang's (2009) reliability for
categorical indicators, written by Sunthud Pornprasertmanit
(psunthud@gmail.com) for the deprecated reliability() function.
References
Bentler, P. M. (1968). Alpha-maximized factor analysis (alphamax): Its relation to alpha and canonical factor analysis. Psychometrika, 33(3), 335–345. doi:10.1007/BF02289328
Chalmers, R. P. (2018). On misconceptions and the limited usefulness of ordinal alpha. Educational and Psychological Measurement, 78(6), 1056–1071. doi:10.1177/0013164417727036
Cho, E. (2021) Neither Cronbach’s alpha nor McDonald’s omega: A commentary on Sijtsma and Pfadt. Psychometrika, 86(4), 877–886. doi:10.1007/s11336-021-09801-1
Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16(3), 297–334. doi:10.1007/BF02310555
Geldhof, G. J., Preacher, K. J., & Zyphur, M. J. (2014). Reliability estimation in a multilevel confirmatory factor analysis framework. Psychological Methods, 19(1), 72–91. doi:10.1037/a0032138
Green, S. B., & Yang, Y. (2009). Reliability of summed item scores using structural equation modeling: An alternative to coefficient alpha. Psychometrika, 74(1), 155–167. doi:10.1007/s11336-008-9099-3
Jak, S., Jorgensen, T. D., & Rosseel, Y. (2021). Evaluating cluster-level
factor models with lavaan and Mplus. Psych, 3(2),
134–152. doi:10.3390/psych3020012
Lai, M. H. C. (2021). Composite reliability of multilevel data: It’s about observed scores and construct meanings. Psychological Methods, 26(1), 90–102. doi:10.1037/met0000287
Lu, Z., Hong, M., & Kim, S. (2020). Formulas of multilevel reliabilities for tests with ordered categorical responses. In M. Wiberg, D. Molenaar, J. González, U.Böckenholt, & J.-S. Kim (Eds.), Quantitative psychology: The 85th annual meeting of the Psychometric Society, Virtual (pp. 103–112). Springer. doi:10.1007/978-3-030-74772-5_10
Lüdtke, O., Marsh, H. W., Robitzsch, A., & Trautwein, U. (2011).
A 2 \times 2 taxonomy of multilevel latent contextual models:
Accuracy–bias trade-offs in full and partial error correction models.
Psychological Methods, 16(4), 444–467. doi:10.1037/a0024376
McDonald, R. P. (1999). Test theory: A unified treatment. Mahwah, NJ: Erlbaum.
Zumbo, B. D., Gadermann, A. M., & Zeisser, C. (2007). Ordinal versions of coefficients alpha and theta for Likert rating scales. Journal of Modern Applied Statistical Methods, 6(1), 21–29. doi:10.22237/jmasm/1177992180
Zumbo, B. D., & Kroc, E. (2019). A measurement is a choice and Stevens’ scales of measurement do not help make it: A response to Chalmers. Educational and Psychological Measurement, 79(6), 1184–1197. doi:10.1177/0013164419844305
See Also
maximalRelia() for the maximal reliability of weighted composite
Examples
data(HolzingerSwineford1939)
HS9 <- HolzingerSwineford1939[ , c("x7","x8","x9")]
HSbinary <- as.data.frame( lapply(HS9, cut, 2, labels=FALSE) )
names(HSbinary) <- c("y7","y8","y9")
HS <- cbind(HolzingerSwineford1939, HSbinary)
HS.model <- ' visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ y7 + y8 + y9 '
fit <- cfa(HS.model, data = HS, ordered = c("y7","y8","y9"), std.lv = TRUE)
fitg <- cfa(HS.model, data = HS, ordered = c("y7","y8","y9"), std.lv = TRUE,
group = "school")
## works for factors with exclusively continuous OR categorical indicators
compRelSEM(fit)
compRelSEM(fitg)
## reliability for composite of ALL indicators only available when they are
## all continuous or all categorical. The example below calculates a
## composite of continuous items from 2 factors (visual and textual)
## using the custom-weights syntax (note the "<~" operator)
w.tot <- '
visual <~ x1 + x2 + x3
textual <~ x4 + x5 + x6
total <~ x1 + x2 + x3 + x4 + x5 + x6
'
compRelSEM(fit, W = w.tot)
## ----------------------
## Higher-order construct
## ----------------------
## Reliability of a composite that represents a higher-order factor
mod.hi <- ' visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9
general =~ visual + textual + speed '
fit.hi <- cfa(mod.hi, data = HolzingerSwineford1939)
## "general" is the factor representing "true scores", but it has no
## observed indicators. Must use custom-weights syntax:
compRelSEM(fit.hi, W = 'g <~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9')
## ----------------------
## Hierarchical omega
## and omega Total
## ----------------------
mod.bi <- ' visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9
general =~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 '
fit.bi <- cfa(mod.bi, data = HolzingerSwineford1939,
orthogonal = TRUE, std.lv = TRUE)
compRelSEM(fit.bi, return.total = -1) # omega_Total
compRelSEM(fit.bi, return.total = -1, # omega_Hierarchical
true = list(.TOTAL. = "general"))
## ----------------------
## Multilevel Constructs
## ----------------------
## Same factor structure with metric invariance across levels (Jak et al., 2021)
model2 <- '
level: 1
f1 =~ y1 + L2*y2 + L3*y3
f2 =~ y4 + L5*y5 + L6*y6
level: 2
f1 =~ y1 + L2*y2 + L3*y3
f2 =~ y4 + L5*y5 + L6*y6
'
fit2 <- sem(model2, data = Demo.twolevel, cluster = "cluster")
## Lai's (2021, Eq. 13) omega index for a configural (Level-1) construct,
## treating common-factor variance at both levels as "true" variance
compRelSEM(fit2)
## Lai's (2021, Eq. 17) omega index for a shared (Level-2) construct
## (also its interrater reliability coefficient)
compRelSEM(fit2, shared = c("f1","f2"), add.IRR = TRUE)
## Geldhof et al.'s (2014) level-specific indices imply a different
## composite (hypothetically) calculated per level. Thus, use
## unique composite names per level.
W2.Geldhof <- ' level: 1
F1w <~ y1 + y2 + y3
F2w <~ y4 + y5 + y6
level: 2
F1b <~ y1 + y2 + y3
F2b <~ y4 + y5 + y6
'
compRelSEM(fit2, W = W2.Geldhof)
Build an object summarizing fit indices across multiple models
Description
This function will create the template to compare fit indices across multiple fitted lavaan objects. The results can be exported to a clipboard or a file later.
Usage
compareFit(..., nested = TRUE, argsLRT = list(), indices = TRUE,
moreIndices = FALSE, baseline.model = NULL, nPrior = 1)
Arguments
... |
fitted |
nested |
|
argsLRT |
|
indices |
|
moreIndices |
|
baseline.model |
optional fitted lavaan::lavaan model passed to
|
nPrior |
passed to |
Value
A FitDiff object that saves model fit comparisons across multiple models. If the models are not nested, only fit indices for each model are returned. If the models are nested, the differences in fit indices are additionally returned, as well as test statistics comparing each sequential pair of models (ordered by their degrees of freedom).
Author(s)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
Sunthud Pornprasertmanit (psunthud@gmail.com)
See Also
Examples
HS.model <- ' visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9 '
## non-nested models
fit1 <- cfa(HS.model, data = HolzingerSwineford1939)
m2 <- ' f1 =~ x1 + x2 + x3 + x4
f2 =~ x5 + x6 + x7 + x8 + x9 '
fit2 <- cfa(m2, data = HolzingerSwineford1939)
(out1 <- compareFit(fit1, fit2, nested = FALSE))
summary(out1)
## nested model comparisons: measurement equivalence/invariance
fit.config <- cfa(HS.model, data = HolzingerSwineford1939, group = "school")
fit.metric <- cfa(HS.model, data = HolzingerSwineford1939, group = "school",
group.equal = "loadings")
fit.scalar <- cfa(HS.model, data = HolzingerSwineford1939, group = "school",
group.equal = c("loadings","intercepts"))
fit.strict <- cfa(HS.model, data = HolzingerSwineford1939, group = "school",
group.equal = c("loadings","intercepts","residuals"))
measEqOut <- compareFit(fit.config, fit.metric, fit.scalar, fit.strict,
moreIndices = TRUE) # include moreFitIndices()
summary(measEqOut)
summary(measEqOut, fit.measures = "all")
summary(measEqOut, fit.measures = c("aic", "bic", "sic", "ibic"))
Simulated Dataset to Demonstrate Two-way Latent Interaction
Description
A simulated data set with 2 independent factors and 1 dependent factor where each factor has three indicators
Usage
dat2way
Format
A data.frame with 500 observations of 9 variables.
- x1
The first indicator of the first independent factor
- x2
The second indicator of the first independent factor
- x3
The third indicator of the first independent factor
- x4
The first indicator of the second independent factor
- x5
The second indicator of the second independent factor
- x6
The third indicator of the second independent factor
- x7
The first indicator of the dependent factor
- x8
The second indicator of the dependent factor
- x9
The third indicator of the dependent factor
Source
Data were generated by the MASS::mvrnorm() function in
the MASS package.
Examples
head(dat2way)
Simulated Dataset to Demonstrate Three-way Latent Interaction
Description
A simulated data set with 3 independent factors and 1 dependent factor where each factor has three indicators
Usage
dat3way
Format
A data.frame with 500 observations of 12 variables.
- x1
The first indicator of the first independent factor
- x2
The second indicator of the first independent factor
- x3
The third indicator of the first independent factor
- x4
The first indicator of the second independent factor
- x5
The second indicator of the second independent factor
- x6
The third indicator of the second independent factor
- x7
The first indicator of the third independent factor
- x8
The second indicator of the third independent factor
- x9
The third indicator of the third independent factor
- x10
The first indicator of the dependent factor
- x11
The second indicator of the dependent factor
- x12
The third indicator of the dependent factor
Source
Data were generated by the MASS::mvrnorm() function in
the MASS package.
Examples
head(dat3way)
Simulated Data set to Demonstrate Categorical Measurement Invariance
Description
A simulated data set with 2 factors with 4 indicators each separated into two groups
Usage
datCat
Format
A data.frame with 200 observations of 9 variables.
- g
Sex of respondents
- u1
Indicator 1
- u2
Indicator 2
- u3
Indicator 3
- u4
Indicator 4
- u5
Indicator 5
- u6
Indicator 6
- u7
Indicator 7
- u8
Indicator 8
Source
Data were generated using the lavaan package.
Examples
head(datCat)
Calculate discriminant validity statistics
Description
Calculate discriminant validity statistics based on a fitted lavaan object
Usage
discriminantValidity(object, cutoff = 0.9, merge = FALSE, level = 0.95,
boot.ci.type = "perc")
Arguments
object |
The lavaan::lavaan model object returned by
the |
cutoff |
A cutoff to be used in the constrained models in likelihood ratio tests. |
merge |
Whether the constrained models should be constructed by merging
two factors as one. Implies |
level |
The confidence level required. |
boot.ci.type |
If bootstrapping was used, the type of interval required.
The value should be one of |
Details
Evaluated on the measurement scale level, discriminant validity is commonly evaluated by checking if each pair of latent correlations is sufficiently below one (in absolute value) that the latent variables can be thought of representing two distinct constructs.
discriminantValidity function calculates two sets of statistics that
are commonly used in discriminant validity evaluation. The first set are
factor correlation estimates and their confidence intervals. The second set
is a series of nested model tests, where the baseline model is compared
against a set of constrained models that are constructed by constraining
each factor correlation to the specified cutoff one at a time.
The function assume that the object is set of confirmatory
factor analysis results where the latent variables are scaled by fixing their
variances to 1s. If the model is not a CFA model, the function will calculate
the statistics for the correlations among exogenous latent variables, but
for the residual variances with endogenous variables. If the
latent variables are scaled in some other way (e.g. fixing the first loadings),
the function issues a warning and re-estimates the model by fixing latent
variances to 1 (and estimating all loadings) so that factor covariances are
already estimated as correlations.
The likelihood ratio tests are done by comparing the original baseline model
against more constrained alternatives. By default, these alternatives are
constructed by fixing each correlation at a time to a cutoff value. The
typical purpose of this test is to demonstrate that the estimated factor
correlation is well below the cutoff and a significant chi^2 statistic
thus indicates support for discriminant validity. In some cases, the original
correlation estimate may already be greater than the cutoff, making it
redundant to fit a "restricted" model. When this happens, the likelihood
ratio test will be replaced by comparing the baseline model against itself.
For correlations that are estimated to be negative, a negation of the cutoff
is used in the constrained model.
Another alternative is to do a nested model comparison against a model where
two factors are merged as one by setting the merge argument to
TRUE. In this comparison, the constrained model is constructed by
removing one of the correlated factors from the model and assigning its
indicators to the factor that remains in the model.
Value
A data.frame of latent variable correlation estimates, their
confidence intervals, and a likelihood ratio tests against constrained models.
with the following attributes:
- baseline
The baseline model after possible rescaling.
- constrained
A
listof the fitted constrained models used in the likelihood ratio test.
Author(s)
Mikko Rönkkö (University of Jyväskylä; mikko.ronkko@jyu.fi):
References
Rönkkö, M., & Cho, E. (2022). An updated guideline for assessing discriminant validity. Organizational Research Methods, 25(1), 6–14. doi:10.1177/1094428120968614
Examples
library(lavaan)
HS.model <- ' visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9 '
fit <- cfa(HS.model, data = HolzingerSwineford1939)
discriminantValidity(fit)
discriminantValidity(fit, merge = TRUE)
Empirical Kaiser criterion
Description
Identify the number of factors to extract based on the Empirical Kaiser
Criterion (EKC). The analysis can be run on a data.frame or data
matrix (data), or on a correlation or covariance matrix
(sample.cov) and the sample size (sample.nobs). A
data.frame is returned with two columns: the eigenvalues from your
data or covariance matrix and the reference eigenvalues. The number of
factors suggested by the Empirical Kaiser Criterion (i.e. the sample
eigenvalues greater than the reference eigenvalues), and the number of
factors suggested by the original Kaiser Criterion
(i.e. sample eigenvalues > 1) is printed above the output.
Usage
efa.ekc(data = NULL, sample.cov = NULL, sample.nobs = NULL,
missing = "default", ordered = NULL, plot = TRUE)
Arguments
data |
A |
sample.cov |
A covariance or correlation matrix can be used, instead of
|
sample.nobs |
Number of observations (i.e. sample size) if
|
missing |
If |
ordered |
|
plot |
logical. Whether to print a scree plot comparing the sample eigenvalues with the reference eigenvalues. |
Value
A data.frame showing the sample and reference eigenvalues.
The number of factors suggested by the Empirical Kaiser Criterion (i.e. the sample eigenvalues greater than the reference eigenvalues) is returned as an attribute (see Examples).
The number of factors suggested by the original Kaiser Criterion (i.e.
sample eigenvalues > 1) is also printed as a header to the data.frame
Author(s)
Ylenio Longo (University of Nottingham; yleniolongo@gmail.com)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
References
Braeken, J., & van Assen, M. A. L. M. (2017). An empirical Kaiser criterion. Psychological Methods, 22(3), 450–466. doi:10.1037/met0000074
Examples
## Simulate data with 3 factors
model <- '
f1 =~ .3*x1 + .5*x2 + .4*x3
f2 =~ .3*x4 + .5*x5 + .4*x6
f3 =~ .3*x7 + .5*x8 + .4*x9
'
dat <- simulateData(model, seed = 123)
## save summary statistics
myCovMat <- cov(dat)
myCorMat <- cor(dat)
N <- nrow(dat)
## Run the EKC function
(out <- efa.ekc(dat))
## To extract the recommended number of factors using the EKC:
attr(out, "nfactors")
## If you do not have raw data, you can use summary statistics
(x1 <- efa.ekc(sample.cov = myCovMat, sample.nobs = N, plot = FALSE))
(x2 <- efa.ekc(sample.cov = myCorMat, sample.nobs = N, plot = FALSE))
EPC Equivalence Feasibility Check for Standardized Parameters
Description
Performs an EPC-based feasibility check to assess whether a set of
standardized population parameters defines a valid population
covariance matrix and whether trivially misspecified parameters
remain within a user-defined smallest effect size of interest (SESOI).
Feasibility is evaluated by constructing implied population models
under targeted parameter perturbations and examining EPC behavior
using epcEquivFit.
Usage
epcEquivCheck(lavaanObj, minRelEffect = 0.75, stdLoad = 0.4, cor = 0.1,
corLatent = NULL, corResidual = NULL, stdBeta = 0.1)
Arguments
lavaanObj |
A fitted |
minRelEffect |
A scalar in (0, 1) specifying the minimum relative magnitude of the standardized perturbation to be evaluated. The default value of 0.75 indicates that perturbations equal to 75\ the SESOI are treated as trivial. If EPCs exceed the SESOI under such perturbations, EPC equivalence testing is not recommended. |
stdLoad |
Standardized factor loading used to define the SESOI for loading misspecifications. |
cor |
Standardized correlation used as a default SESOI for
covariance misspecifications. This value is used for both latent
and residual covariances unless overridden by
|
corLatent |
Standardized latent factor correlation used to
define the SESOI for latent covariance misspecifications. If
|
corResidual |
Standardized residual correlation used to define
the SESOI for indicator residual covariance misspecifications. If
|
stdBeta |
Standardized regression coefficient used to define the SESOI for structural misspecifications. |
Details
This function focuses on standardized parameters and supports recursive SEMs with continuous indicators only.
The procedure first checks whether the standardized parameters imply
a positive definite population covariance matrix. It then evaluates
EPC behavior under both positive and negative trivial
misspecifications by repeatedly constructing implied population
covariance matrices with perturbed parameters
(minRelEffect \times SESOI), refitting the model, and
re-evaluating EPCs.
Models with categorical indicators, formative indicators, or multiple-group structures are not supported.
Value
An object of class "epcEquivCheckStd" containing:
-
feasible: Logical indicator of whether a valid standardized population model exists. -
any_M: Logical indicator of whether any EPC exceeded the SESOI under the evaluated misspecifications. -
recommendation: Character string summarizing feasibility (e.g.,"RECOMMENDED","NOT RECOMMENDED"). -
M_table: Data frame summarizing EPCs exceeding the SESOI, if any. -
testeffect: Data frame reporting the smallest tested standardized perturbations in each direction.
See Also
Examples
library(lavaan)
one.model <- ' onefactor =~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 '
fit <- cfa(one.model, data = HolzingerSwineford1939)
epcEquivCheck(fit)
EPC Equivalence Fit Evaluation Using Modification Indices
Description
Evaluates model fit from an equivalence-testing perspective by aggregating local EPC-based diagnostics into a global, fit-style assessment. The procedure combines modification indices (MI), expected parameter changes (EPC), statistical power, and confidence intervals relative to a smallest effect size of interest (SESOI).
Usage
epcEquivFit(lavaanObj, stdLoad = 0.4, cor = 0.1, corLatent = NULL,
corResidual = NULL, stdBeta = 0.1, stdIntcept = 0.2, stdSesoi = NULL,
sesoi = NULL, cilevel = 0.9, ...)
## S3 method for class 'epcequivfit.data.frame'
summary(object, ..., top = 5, ssv = FALSE)
Arguments
lavaanObj |
A fitted |
stdLoad |
Standardized factor loading defining the SESOI for loading misspecifications. Default is 0.4. |
cor |
Default standardized correlation defining the SESOI for covariance misspecifications. Used for both latent and residual covariances unless overridden. |
corLatent |
Standardized latent factor correlation defining the
SESOI for latent covariance misspecifications. If |
corResidual |
Standardized residual correlation defining the
SESOI for residual covariance misspecifications. If |
stdBeta |
Standardized regression coefficient defining the SESOI for structural misspecifications. Default is 0.1. |
stdIntcept |
Standardized intercept (Cohen's d) defining the SESOI for intercept misspecifications. Default is 0.2. |
stdSesoi |
Optional vector of standardized SESOI values. If provided, overrides operator-specific SESOI definitions. |
sesoi |
Optional vector of unstandardized SESOI values. If
provided, overrides |
cilevel |
Confidence level for EPC confidence intervals used in CI-based equivalence testing. |
... |
Additional arguments passed to
|
object |
An object returned by |
top |
Number of top-ranked EPCs to display. |
ssv |
Logical; whether to include power-based diagnostics. |
Details
Two complementary local decision rules are implemented:
Method 1 (Power-based; Saris, Satorra, & van der Veld, 2009). Modification indices, statistical power, and EPC magnitude are jointly evaluated (the J-rule) to classify fixed parameters as misspecified, not misspecified, or inconclusive.
Method 2 (CI-based equivalence testing). Confidence intervals of EPCs are compared against a trivial misspecification region defined by the SESOI to determine whether fixed parameters are substantially misspecified, trivially misspecified, underpowered, or inconclusive.
The resulting local classifications are returned in a single data frame and can be summarized to yield a global equivalence-style fit evaluation.
This function provides a local-to-global equivalence-based alternative to traditional exact-fit evaluation. It is designed to assess whether fixed parameters are substantively misspecified relative to a SESOI, rather than whether a model fits exactly.
Models with categorical indicators or unsupported constraints may not be fully supported.
Value
A data frame with one row per fixed parameter, containing:
Parameter identifiers:
lhs,op,rhs, andgroup.Modification index (
mi) and expected parameter change estimates (epc).Unstandardized and standardized smallest effect size of interest values (
sesoi,std.sesoi).Power-based decision (
decision.pow) and related diagnostics, including whether the modification index is statistically significant (significant.mi) and whether the misfit at the SESOI has power greater than 0.80 (high.power). Decision labels are: M = Substantially Misspecified, I = Inconclusive, NM = Trivially Misspecified, EPC:M = Substantially Misspecified based on EPC information, EPC:NM = Trivially Misspecified based on EPC information.EPC-related statistics, including the standard error of the EPC (
se.epc), confidence interval bounds for the EPC (lower.epc,upper.epc), and confidence interval bounds for the standardized EPC (lower.std.epc,upper.std.epc).Confidence-interval–based equivalence decision (
decision.ci), with labels: M = Substantially Misspecified (EPC exceeds the SESOI), I = Inconclusive, NM = Trivially Misspecified, U = Underpowered (CI too wide to evaluate equivalence relative to the SESOI).
References
Saris, W. E., Satorra, A., & van der Veld, W. M. (2009). Testing structural equation models or detection of misspecifications? Structural Equation Modeling, 16(4), 561–582.
See Also
Examples
library(lavaan)
one.model <- ' onefactor =~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 '
fit <- cfa(one.model, data = HolzingerSwineford1939)
out <- epcEquivFit(fit)
out
summary(out)
Simulated Data set to Demonstrate Longitudinal Measurement Invariance
Description
A simulated data set with 1 factors with 3 indicators in three timepoints
Usage
exLong
Format
A data.frame with 200 observations of 10 variables.
- sex
Sex of respondents
- y1t1
Indicator 1 in Time 1
- y2t1
Indicator 2 in Time 1
- y3t1
Indicator 3 in Time 1
- y1t2
Indicator 1 in Time 2
- y2t2
Indicator 2 in Time 2
- y3t2
Indicator 3 in Time 2
- y1t3
Indicator 1 in Time 3
- y2t3
Indicator 2 in Time 3
- y3t3
Indicator 3 in Time 3
Source
Data were generated using the simsem package.
Examples
head(exLong)
Find the statistical power based on population RMSEA
Description
Find the proportion of the samples from the sampling distribution of RMSEA in the alternative hypothesis rejected by the cutoff dervied from the sampling distribution of RMSEA in the null hypothesis. This function can be applied for both test of close fit and test of not-close fit (MacCallum, Browne, & Suguwara, 1996)
Usage
findRMSEApower(rmsea0, rmseaA, df, n, alpha = 0.05, group = 1)
Arguments
rmsea0 |
Null RMSEA |
rmseaA |
Alternative RMSEA |
df |
Model degrees of freedom |
n |
Sample size of a dataset |
alpha |
Alpha level used in power calculations |
group |
The number of group that is used to calculate RMSEA. |
Details
This function find the proportion of sampling distribution derived from the
alternative RMSEA that is in the critical region derived from the sampling
distribution of the null RMSEA. If rmseaA is greater than
rmsea0, the test of close fit is used and the critical region is in
the right hand side of the null sampling distribution. On the other hand, if
rmseaA is less than rmsea0, the test of not-close fit is used
and the critical region is in the left hand side of the null sampling
distribution (MacCallum, Browne, & Suguwara, 1996).
There is also a Shiny app called "power4SEM" that provides a graphical user interface for this functionality (Jak et al., in press). It can be accessed at https://sjak.shinyapps.io/power4SEM/.
Author(s)
Sunthud Pornprasertmanit (psunthud@gmail.com)
References
MacCallum, R. C., Browne, M. W., & Sugawara, H. M. (1996). Power analysis and determination of sample size for covariance structure modeling. Psychological Methods, 1(2), 130–149. doi:10.1037/1082-989X.1.2.130
Jak, S., Jorgensen, T. D., Verdam, M. G., Oort, F. J., & Elffers, L. (2021). Analytical power calculations for structural equation modeling: A tutorial and Shiny app. Behavior Research Methods, 53, 1385–1406. doi:10.3758/s13428-020-01479-0
See Also
-
plotRMSEApower()to plot the statistical power based on population RMSEA given the sample size -
plotRMSEAdist()to visualize the RMSEA distributions -
findRMSEAsamplesize()to find the minium sample size for a given statistical power based on population RMSEA
Examples
findRMSEApower(rmsea0 = .05, rmseaA = .08, df = 20, n = 200)
Find power given a sample size in nested model comparison
Description
Find the sample size that the power in rejection the samples from the alternative pair of RMSEA is just over the specified power.
Usage
findRMSEApowernested(rmsea0A = NULL, rmsea0B = NULL, rmsea1A,
rmsea1B = NULL, dfA, dfB, n, alpha = 0.05, group = 1)
Arguments
rmsea0A |
The |
rmsea0B |
The |
rmsea1A |
The |
rmsea1B |
The |
dfA |
degree of freedom of the more-restricted model |
dfB |
degree of freedom of the less-restricted model |
n |
Sample size |
alpha |
The alpha level |
group |
The number of group in calculating RMSEA |
Author(s)
Bell Clinton
Pavel Panko (Texas Tech University; pavel.panko@ttu.edu)
Sunthud Pornprasertmanit (psunthud@gmail.com)
References
MacCallum, R. C., Browne, M. W., & Cai, L. (2006). Testing differences between nested covariance structure models: Power analysis and null hypotheses. Psychological Methods, 11(1), 19–35. doi:10.1037/1082-989X.11.1.19
See Also
-
plotRMSEApowernested()to plot the statistical power for nested model comparison based on population RMSEA given the sample size -
findRMSEAsamplesizenested()to find the minium sample size for a given statistical power in nested model comparison based on population RMSEA
Examples
findRMSEApowernested(rmsea0A = 0.06, rmsea0B = 0.05, rmsea1A = 0.08,
rmsea1B = 0.05, dfA = 22, dfB = 20, n = 200,
alpha = 0.05, group = 1)
Find the minimum sample size for a given statistical power based on population RMSEA
Description
Find the minimum sample size for a specified statistical power based on population RMSEA. This function can be applied for both test of close fit and test of not-close fit (MacCallum, Browne, & Suguwara, 1996)
Usage
findRMSEAsamplesize(rmsea0, rmseaA, df, power = 0.8, alpha = 0.05,
group = 1)
Arguments
rmsea0 |
Null RMSEA |
rmseaA |
Alternative RMSEA |
df |
Model degrees of freedom |
power |
Desired statistical power to reject misspecified model (test of close fit) or retain good model (test of not-close fit) |
alpha |
Alpha level used in power calculations |
group |
The number of group that is used to calculate RMSEA. |
Details
This function find the minimum sample size for a specified power based on an
iterative routine. The sample size keep increasing until the calculated
power from findRMSEApower() function is just over the specified
power. If group is greater than 1, the resulting sample size is the
sample size per group.
Author(s)
Sunthud Pornprasertmanit (psunthud@gmail.com)
References
MacCallum, R. C., Browne, M. W., & Sugawara, H. M. (1996). Power analysis and determination of sample size for covariance structure modeling. Psychological Methods, 1(2), 130–149. doi:10.1037/1082-989X.1.2.130
Jak, S., Jorgensen, T. D., Verdam, M. G., Oort, F. J., & Elffers, L. (2021). Analytical power calculations for structural equation modeling: A tutorial and Shiny app. Behavior Research Methods, 53, 1385–1406. doi:10.3758/s13428-020-01479-0
See Also
-
plotRMSEApower()to plot the statistical power based on population RMSEA given the sample size -
plotRMSEAdist()to visualize the RMSEA distributions -
findRMSEApower()to find the statistical power based on population RMSEA given a sample size
Examples
findRMSEAsamplesize(rmsea0 = .05, rmseaA = .08, df = 20, power = 0.80)
Find sample size given a power in nested model comparison
Description
Find the sample size that the power in rejection the samples from the alternative pair of RMSEA is just over the specified power.
Usage
findRMSEAsamplesizenested(rmsea0A = NULL, rmsea0B = NULL, rmsea1A,
rmsea1B = NULL, dfA, dfB, power = 0.8, alpha = 0.05, group = 1)
Arguments
rmsea0A |
The |
rmsea0B |
The |
rmsea1A |
The |
rmsea1B |
The |
dfA |
degree of freedom of the more-restricted model. |
dfB |
degree of freedom of the less-restricted model. |
power |
The desired statistical power. |
alpha |
The alpha level. |
group |
The number of group in calculating RMSEA. |
Author(s)
Bell Clinton
Pavel Panko (Texas Tech University; pavel.panko@ttu.edu)
Sunthud Pornprasertmanit (psunthud@gmail.com)
References
MacCallum, R. C., Browne, M. W., & Cai, L. (2006). Testing differences between nested covariance structure models: Power analysis and null hypotheses. Psychological Methods, 11(1), 19–35. doi:10.1037/1082-989X.11.1.19
See Also
-
plotRMSEApowernested()to plot the statistical power for nested model comparison based on population RMSEA given the sample size -
findRMSEApowernested()to find the power for a given sample size in nested model comparison based on population RMSEA
Examples
findRMSEAsamplesizenested(rmsea0A = 0, rmsea0B = 0, rmsea1A = 0.06,
rmsea1B = 0.05, dfA = 22, dfB = 20, power = 0.80,
alpha = .05, group = 1)
Fraction of Missing Information.
Description
This function estimates the Fraction of Missing Information (FMI) for summary statistics of each variable, using either an incomplete data set or a list of imputed data sets.
Usage
fmi(data, method = "saturated", group = NULL, ords = NULL,
varnames = NULL, exclude = NULL, return.fit = FALSE)
Arguments
data |
Either a single |
method |
character. If |
group |
|
ords |
Optional |
varnames |
Optional |
exclude |
Optional |
return.fit |
logical. If |
Details
The function estimates a saturated model with lavaan::lavaan() for a
single incomplete data set using FIML, or with lavaan.mi::lavaan.mi()
for a list of imputed data sets. If method = "saturated", FMI will be
estiamted for all summary statistics, which could take a lot of time with
big data sets. If method = "null", FMI will only be estimated for
univariate statistics (e.g., means, variances, thresholds). The saturated
model gives more reliable estimates, so it could also help to request a
subset of variables from a large data set.
Value
fmi() returns a list with at least 2 of the following:
Covariances |
A list of symmetric matrices: (1) the estimated/pooled
covariance matrix, or a list of group-specific matrices (if applicable)
and (2) a matrix of FMI, or a list of group-specific matrices (if
applicable). Only available if |
Variances |
The estimated/pooled variance for each numeric variable.
Only available if |
Means |
The estimated/pooled mean for each numeric variable. |
Thresholds |
The estimated/pooled threshold(s) for each ordered-categorical variable. |
Author(s)
Mauricio Garnier Villarreal (Vrije Universiteit Amsterdam; m.garniervillarreal@vu.nl)
Terrence Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
References
Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. New York, NY: Wiley.
Savalei, V. & Rhemtulla, M. (2012). On obtaining estimates of the fraction of missing information from full information maximum likelihood. Structural Equation Modeling, 19(3), 477–494. doi:10.1080/10705511.2012.687669
Wagner, J. (2010). The fraction of missing information as a tool for monitoring the quality of survey data. Public Opinion Quarterly, 74(2), 223–243. doi:10.1093/poq/nfq007
Examples
HSMiss <- HolzingerSwineford1939[ , c(paste("x", 1:9, sep = ""),
"ageyr","agemo","school")]
set.seed(12345)
HSMiss$x5 <- ifelse(HSMiss$x5 <= quantile(HSMiss$x5, .3), NA, HSMiss$x5)
age <- HSMiss$ageyr + HSMiss$agemo/12
HSMiss$x9 <- ifelse(age <= quantile(age, .3), NA, HSMiss$x9)
## calculate FMI (using FIML, provide partially observed data set)
(out1 <- fmi(HSMiss, exclude = "school"))
(out2 <- fmi(HSMiss, exclude = "school", method = "null"))
(out3 <- fmi(HSMiss, varnames = c("x5","x6","x7","x8","x9")))
(out4 <- fmi(HSMiss, method = "cor", group = "school")) # correlations by group
## significance tests in lavaan(.mi) object
out5 <- fmi(HSMiss, method = "cor", return.fit = TRUE)
summary(out5) # factor loading == SD, covariance = correlation
if(requireNamespace("lavaan.mi")){
## ordered-categorical data
data(binHS5imps, package = "lavaan.mi")
## calculate FMI, using list of imputed data sets
fmi(binHS5imps, group = "school")
}
Wrapper for goric.lavaan() from the restriktor package
Description
The goricaSEM() function is an interface to restriktor::goric.lavaan(),
allowing users to perform generalized order-restricted information criterion
approximation (GORICA) analysis specifically for structural equation
models fitted using the lavaan package.
Usage
goricaSEM(object, ..., hypotheses = NULL, comparison = NULL,
type = "gorica", standardized = FALSE, debug = FALSE)
Arguments
object |
A lavaan::lavaan object. |
... |
Additional arguments passed to |
hypotheses |
A named |
comparison |
A |
type |
A |
standardized |
|
debug |
|
Details
This function is designed as a wrapper for the restriktor::goric.lavaan()
function. It calculates GORICA values and weights, which can be used to
compare models or hypotheses under inequality constraints.
The hypotheses= argument allows users to specify constraints in text-based
syntax or matrix notation. For text-based syntax, constraints are specified
as a string (e.g., "a1 > a2"). For matrix notation, a named list with
$constraints, $rhs, and $neq elements can be provided.
The comparison= argument determines whether the specified hypothesis is
compared against its "complement", the "unconstrained" model, or
neither ("none").
Value
A list containing the results of the goric.lavaan function,
including:
The log-likelihood.
Penalty term.
GORIC(A) values and weights.
Relative GORIC(A) weights.
Author(s)
Leonard Vanbrabant and Rebecca Kuiper
References
Kuiper, R. M., Hoijtink, H., & Silvapulle, M. J. (2011). An Akaike-type information criterion for model selection under inequality constraints. Biometrika, 98(2), 495–501. doi:10.1093/biomet/asr002
Vanbrabant, L., Van Loey, N., & Kuiper, R. M. (2020). Evaluating a theory-based hypothesis against its complement using an AIC-type information criterion with an application to facial burn injury. Psychological Methods, 25(2), 129–142. doi:10.1037/met0000238
See Also
Examples
## Example: Perform GORICA analysis on a lavaan model
library(lavaan)
library(restriktor)
## Define the SEM model
model <- '
ind60 =~ x1 + x2 + x3
dem60 =~ y1 + a1*y2 + b1*y3 + c1*y4
dem65 =~ y5 + a2*y6 + b2*y7 + c2*y8
dem60 ~ ind60
dem65 ~ ind60 + dem60
y1 ~~ y5
y2 ~~ y4 + y6
y3 ~~ y7
y4 ~~ y8
y6 ~~ y8
'
## Fit the model
data(PoliticalDemocracy)
fit <- sem(model, data = PoliticalDemocracy)
## Define hypotheses
myHypothesis <- 'a1 > a2, b1 > b2, c1 > c2'
## Perform GORICA analysis
result <- goricaSEM(fit, hypotheses = list(H1 = myHypothesis),
standardized = FALSE, comparison = "complement",
type = "gorica")
## Print result
print(result)
Assessing Discriminant Validity using Heterotrait–Monotrait Ratio
Description
This function assesses discriminant validity through the
heterotrait-monotrait ratio (HTMT) of the correlations (Henseler, Ringlet &
Sarstedt, 2015). Specifically, it assesses the arithmetic (Henseler et al.,
) or geometric (Roemer et al., 2021) mean correlation
among indicators across constructs (i.e. heterotrait–heteromethod
correlations) relative to the geometric-mean correlation among indicators
within the same construct (i.e. monotrait–heteromethod correlations).
The resulting HTMT(2) values are interpreted as estimates of inter-construct
correlations. Absolute values of the correlations are recommended to
calculate the HTMT matrix, and are required to calculate HTMT2. Correlations
are estimated using the lavaan::lavCor() function.
Usage
htmt(model, data = NULL, sample.cov = NULL, missing = "listwise",
ordered = NULL, absolute = TRUE, htmt2 = TRUE)
Arguments
model |
lavaan |
data |
A |
sample.cov |
A covariance or correlation matrix can be used, instead of
|
missing |
If |
ordered |
Character vector. Only used if object is a |
absolute |
|
htmt2 |
|
Value
A matrix showing HTMT(2) values (i.e., discriminant validity) between each pair of factors.
Author(s)
Ylenio Longo (University of Nottingham; yleniolongo@gmail.com)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
References
Henseler, J., Ringle, C. M., & Sarstedt, M. (2015). A new criterion for assessing discriminant validity in variance-based structural equation modeling. Journal of the Academy of Marketing Science, 43(1), 115–135. doi:10.1007/s11747-014-0403-8
Roemer, E., Schuberth, F., & Henseler, J. (2021). HTMT2—An improved criterion for assessing discriminant validity in structural equation modeling. Industrial Management & Data Systems, 121(21), 2637–2650. doi:10.1108/IMDS-02-2021-0082
Voorhees, C. M., Brady, M. K., Calantone, R., & Ramirez, E. (2016). Discriminant validity testing in marketing: An analysis, causes for concern, and proposed remedies. Journal of the Academy of Marketing Science, 44(1), 119–134. doi:10.1007/s11747-015-0455-4
Examples
HS.model <- ' visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9 '
dat <- HolzingerSwineford1939[, paste0("x", 1:9)]
htmt(HS.model, dat)
## save covariance matrix
HS.cov <- cov(HolzingerSwineford1939[, paste0("x", 1:9)])
## HTMT using arithmetic mean
htmt(HS.model, sample.cov = HS.cov, htmt2 = FALSE)
Specify starting values from a lavaan output
Description
This function will save the parameter estimates of a lavaan output and impose those parameter estimates as starting values for another analysis model. The free parameters with the same names or the same labels across two models will be imposed the new starting values. This function may help to increase the chance of convergence in a complex model (e.g., multitrait-multimethod model or complex longitudinal invariance model).
Usage
imposeStart(out, expr, silent = TRUE)
Arguments
out |
The |
expr |
The original code that users use to run a lavaan model |
silent |
Logical to print the parameter table with new starting values |
Value
A fitted lavaan model
Author(s)
Sunthud Pornprasertmanit (psunthud@gmail.com)
Examples
## The following example show that the longitudinal weak invariance model
## using effect coding was not convergent with three time points but convergent
## with two time points. Thus, the parameter estimates from the model with
## two time points are used as starting values of the three time points.
## The model with new starting values is convergent properly.
weak2time <- '
# Loadings
f1t1 =~ LOAD1*y1t1 + LOAD2*y2t1 + LOAD3*y3t1
f1t2 =~ LOAD1*y1t2 + LOAD2*y2t2 + LOAD3*y3t2
# Factor Variances
f1t1 ~~ f1t1
f1t2 ~~ f1t2
# Factor Covariances
f1t1 ~~ f1t2
# Error Variances
y1t1 ~~ y1t1
y2t1 ~~ y2t1
y3t1 ~~ y3t1
y1t2 ~~ y1t2
y2t2 ~~ y2t2
y3t2 ~~ y3t2
# Error Covariances
y1t1 ~~ y1t2
y2t1 ~~ y2t2
y3t1 ~~ y3t2
# Factor Means
f1t1 ~ NA*1
f1t2 ~ NA*1
# Measurement Intercepts
y1t1 ~ INT1*1
y2t1 ~ INT2*1
y3t1 ~ INT3*1
y1t2 ~ INT4*1
y2t2 ~ INT5*1
y3t2 ~ INT6*1
# Constraints for Effect-coding Identification
LOAD1 == 3 - LOAD2 - LOAD3
INT1 == 0 - INT2 - INT3
INT4 == 0 - INT5 - INT6
'
model2time <- lavaan(weak2time, data = exLong)
weak3time <- '
# Loadings
f1t1 =~ LOAD1*y1t1 + LOAD2*y2t1 + LOAD3*y3t1
f1t2 =~ LOAD1*y1t2 + LOAD2*y2t2 + LOAD3*y3t2
f1t3 =~ LOAD1*y1t3 + LOAD2*y2t3 + LOAD3*y3t3
# Factor Variances
f1t1 ~~ f1t1
f1t2 ~~ f1t2
f1t3 ~~ f1t3
# Factor Covariances
f1t1 ~~ f1t2 + f1t3
f1t2 ~~ f1t3
# Error Variances
y1t1 ~~ y1t1
y2t1 ~~ y2t1
y3t1 ~~ y3t1
y1t2 ~~ y1t2
y2t2 ~~ y2t2
y3t2 ~~ y3t2
y1t3 ~~ y1t3
y2t3 ~~ y2t3
y3t3 ~~ y3t3
# Error Covariances
y1t1 ~~ y1t2
y2t1 ~~ y2t2
y3t1 ~~ y3t2
y1t1 ~~ y1t3
y2t1 ~~ y2t3
y3t1 ~~ y3t3
y1t2 ~~ y1t3
y2t2 ~~ y2t3
y3t2 ~~ y3t3
# Factor Means
f1t1 ~ NA*1
f1t2 ~ NA*1
f1t3 ~ NA*1
# Measurement Intercepts
y1t1 ~ INT1*1
y2t1 ~ INT2*1
y3t1 ~ INT3*1
y1t2 ~ INT4*1
y2t2 ~ INT5*1
y3t2 ~ INT6*1
y1t3 ~ INT7*1
y2t3 ~ INT8*1
y3t3 ~ INT9*1
# Constraints for Effect-coding Identification
LOAD1 == 3 - LOAD2 - LOAD3
INT1 == 0 - INT2 - INT3
INT4 == 0 - INT5 - INT6
INT7 == 0 - INT8 - INT9
'
### The following command does not provide convergent result
# model3time <- lavaan(weak3time, data = exLong)
### Use starting values from the model with two time points
model3time <- imposeStart(model2time, lavaan(weak3time, data = exLong))
summary(model3time)
Make products of indicators using no centering, mean centering, double-mean centering, or residual centering
Description
The indProd function will make products of indicators using no
centering, mean centering, double-mean centering, or residual centering. The
orthogonalize function is the shortcut of the indProd function
to make the residual-centered indicators products.
Usage
indProd(data, var1, var2, var3 = NULL, match = TRUE, meanC = TRUE,
residualC = FALSE, doubleMC = TRUE, namesProd = NULL)
orthogonalize(data, var1, var2, var3 = NULL, match = TRUE,
namesProd = NULL)
Arguments
data |
The desired data to be transformed. |
var1 |
Names or indices of the variables loaded on the first factor |
var2 |
Names or indices of the variables loaded on the second factor |
var3 |
Names or indices of the variables loaded on the third factor (for three-way interaction) |
match |
Specify |
meanC |
Specify |
residualC |
Specify |
doubleMC |
Specify |
namesProd |
The names of resulting products |
Value
The original data attached with the products.
Author(s)
Sunthud Pornprasertmanit (psunthud@gmail.com) Alexander Schoemann (East Carolina University; schoemanna@ecu.edu)
References
Marsh, H. W., Wen, Z. & Hau, K. T. (2004). Structural equation models of latent interactions: Evaluation of alternative estimation strategies and indicator construction. Psychological Methods, 9(3), 275–300. doi:10.1037/1082-989X.9.3.275
Lin, G. C., Wen, Z., Marsh, H. W., & Lin, H. S. (2010). Structural equation models of latent interactions: Clarification of orthogonalizing and double-mean-centering strategies. Structural Equation Modeling, 17(3), 374–391. doi:10.1080/10705511.2010.488999
Little, T. D., Bovaird, J. A., & Widaman, K. F. (2006). On the merits of orthogonalizing powered and product terms: Implications for modeling interactions among latent variables. Structural Equation Modeling, 13(4), 497–519. doi:10.1207/s15328007sem1304_1
See Also
-
probe2WayMC()For probing the two-way latent interaction when the results are obtained from mean-centering, or double-mean centering. -
probe3WayMC()For probing the three-way latent interaction when the results are obtained from mean-centering, or double-mean centering. -
probe2WayRC()For probing the two-way latent interaction when the results are obtained from residual-centering approach. -
probe3WayRC()For probing the two-way latent interaction when the results are obtained from residual-centering approach. -
plotProbe()Plot the simple intercepts and slopes of the latent interaction.
Examples
## Mean centering / two-way interaction / match-paired
dat <- indProd(attitude[ , -1], var1 = 1:3, var2 = 4:6)
## Residual centering / two-way interaction / match-paired
dat2 <- indProd(attitude[ , -1], var1 = 1:3, var2 = 4:6, match = FALSE,
meanC = FALSE, residualC = TRUE, doubleMC = FALSE)
## Double-mean centering / two-way interaction / match-paired
dat3 <- indProd(attitude[ , -1], var1 = 1:3, var2 = 4:6, match = FALSE,
meanC = TRUE, residualC = FALSE, doubleMC = TRUE)
## Mean centering / three-way interaction / match-paired
dat4 <- indProd(attitude[ , -1], var1 = 1:2, var2 = 3:4, var3 = 5:6)
## Residual centering / three-way interaction / match-paired
dat5 <- orthogonalize(attitude[ , -1], var1 = 1:2, var2 = 3:4, var3 = 5:6,
match = FALSE)
## Double-mean centering / three-way interaction / match-paired
dat6 <- indProd(attitude[ , -1], var1 = 1:2, var2 = 3:4, var3 = 5:6,
match = FALSE, meanC = TRUE, residualC = TRUE,
doubleMC = TRUE)
## To add product-indicators to multiple-imputed data sets
if (requireNamespace("lavaan.mi")) {
data(HS20imps, package = "lavaan.mi")
## apply indProd() to the list of data.frames
imps2 <- lapply(HS20imps, indProd,
var1 = c("x1","x2","x3"), var2 = c("x4","x5","x6"))
## verify:
lapply(imps2, head)
}
Generate data via the Kaiser-Dickman (1962) algorithm.
Description
Given a covariance matrix and sample size, generate raw data that correspond to the covariance matrix. Data can be generated to match the covariance matrix exactly, or to be a sample from the population covariance matrix.
Usage
kd(covmat, n, type = c("exact", "sample"))
Arguments
covmat |
a symmetric, positive definite covariance matrix |
n |
the sample size for the data that will be generated |
type |
type of data generation. |
Details
By default, R's cov() function divides by n-1. The data
generated by this algorithm result in a covariance matrix that matches
covmat, but you must divide by n instead of n-1.
Value
kd returns a data matrix of dimension n by
nrow(covmat).
Author(s)
Ed Merkle (University of Missouri; merklee@missouri.edu)
References
Kaiser, H. F. and Dickman, K. (1962). Sample and population score matrices and sample correlation matrices from an arbitrary population correlation matrix. Psychometrika, 27(2), 179–182. doi:10.1007/BF02289635
Examples
#### First Example
## Get data
dat <- HolzingerSwineford1939[ , 7:15]
hs.n <- nrow(dat)
## Covariance matrix divided by n
hscov <- ((hs.n-1)/hs.n) * cov(dat)
## Generate new, raw data corresponding to hscov
newdat <- kd(hscov, hs.n)
## Difference between new covariance matrix and hscov is minimal
newcov <- (hs.n-1)/hs.n * cov(newdat)
summary(as.numeric(hscov - newcov))
## Generate sample data, treating hscov as population matrix
newdat2 <- kd(hscov, hs.n, type = "sample")
#### Another example
## Define a covariance matrix
covmat <- matrix(0, 3, 3)
diag(covmat) <- 1.5
covmat[2:3,1] <- c(1.3, 1.7)
covmat[3,2] <- 2.1
covmat <- covmat + t(covmat)
## Generate data of size 300 that have this covariance matrix
rawdat <- kd(covmat, 300)
## Covariances are exact if we compute sample covariance matrix by
## dividing by n (vs by n - 1)
summary(as.numeric((299/300)*cov(rawdat) - covmat))
## Generate data of size 300 where covmat is the population covariance matrix
rawdat2 <- kd(covmat, 300)
Finding excessive kurtosis
Description
Finding excessive kurtosis (g_{2}) of an object
Usage
kurtosis(object, population = FALSE)
Arguments
object |
A vector used to find a excessive kurtosis |
population |
|
Details
The excessive kurtosis computed by default is g_{2}, the fourth
standardized moment of the empirical distribution of object.
The population parameter excessive kurtosis \gamma_{2} formula is
\gamma_{2} = \frac{\mu_{4}}{\mu^{2}_{2}} - 3,
where \mu_{i} denotes the i order central moment.
The excessive kurtosis formula for sample statistic g_{2} is
g_{2} = \frac{k_{4}}{k^{2}_{2}} - 3,
where k_{i} are the i order k-statistic.
The standard error of the excessive kurtosis is
Var(\hat{g}_{2}) = \frac{24}{N}
where N is the sample size.
Value
A value of an excessive kurtosis with a test statistic if the
population is specified as FALSE
Author(s)
Sunthud Pornprasertmanit (psunthud@gmail.com)
References
Weisstein, Eric W. (n.d.). Kurtosis. Retrieved from MathWorld–A Wolfram Web Resource: https://mathworld.wolfram.com/Kurtosis.html
See Also
-
skew()Find the univariate skewness of a variable -
mardiaSkew()Find the Mardia's multivariate skewness of a set of variables -
mardiaKurtosis()Find the Mardia's multivariate kurtosis of a set of variables
Examples
kurtosis(1:5)
emmeans Support Functions for lavaan Models
Description
Provide emmeans support for lavaan objects
Usage
recover_data.lavaan(object, lavaan.DV, data = NULL, ...)
emm_basis.lavaan(object, trms, xlev, grid, lavaan.DV, ...)
Arguments
object |
An object of class |
lavaan.DV |
|
data |
An optional |
... |
Further arguments passed to |
trms, xlev, grid |
See |
Details
Supported DVs
lavaan.DV must be an endogenous variable, by appearing on
the left-hand side of either a regression operator ("~")
or an intercept operator ("~1"), or both.
lavaan.DV can also be a vector of endogenous variable, in which
case they will be treated by emmeans as a multivariate outcome
(often, this indicates repeated measures) represented by an additional
factor named rep.meas by default. The mult.name= argument
can be used to overwrite this default name.
Unsupported Models
This functionality does not support the following models:
Multi-level models are not supported.
Models not fit to a
data.frame(i.e., models fit to a covariance matrix).
Dealing with Fixed Parameters
Fixed parameters (set with lavaan's modifiers) are treated as-is:
their values are set by the users, and they have a SE of 0 (as such,
they do not co-vary with any other parameter).
Dealing with Multigroup Models
If a multigroup model is supplied, a factor is added to the reference grid,
the name matching the group argument supplied when fitting the model.
Note that you must set nesting = NULL.
Dealing with Missing Data
Limited testing suggests that these functions do work when the model was fit to incomplete data.
Dealing with Factors
By default emmeans recognizes binary variables (0,1) as a "factor"
with two levels (and not a continuous variable). With some clever contrast
defenitions it should be possible to get the desired emmeans / contasts.
See example below.
Author(s)
Mattan S. Ben-Shachar (Ben-Gurion University of the Negev; matanshm@post.bgu.ac.il)
Examples
## Not run:
library(lavaan)
library(emmeans)
#### Moderation Analysis ####
mean_sd <- function(x) mean(x) + c(-sd(x), 0, sd(x))
model <- '
# regressions
Sepal.Length ~ b1 * Sepal.Width + b2 * Petal.Length + b3 * Sepal.Width:Petal.Length
# define mean parameter label for centered math for use in simple slopes
Sepal.Width ~ Sepal.Width.mean * 1
# define variance parameter label for centered math for use in simple slopes
Sepal.Width ~~ Sepal.Width.var * Sepal.Width
# simple slopes for condition effect
SD.below := b2 + b3 * (Sepal.Width.mean - sqrt(Sepal.Width.var))
mean := b2 + b3 * (Sepal.Width.mean)
SD.above := b2 + b3 * (Sepal.Width.mean + sqrt(Sepal.Width.var))
'
semFit <- sem(model = model,
data = iris)
## Compare simple slopes
# From `emtrends`
test(
emtrends(semFit, ~ Sepal.Width, "Petal.Length",
lavaan.DV = "Sepal.Length",
cov.red = mean_sd)
)
# From lavaan
parameterEstimates(semFit, output = "pretty")[13:15, ]
# Identical slopes.
# SEs differ due to lavaan estimating uncertainty of the mean / SD
# of Sepal.Width, whereas emmeans uses the mean+-SD as is (fixed).
#### Latent DV ####
model <- '
LAT1 =~ Sepal.Length + Sepal.Width
LAT1 ~ b1 * Petal.Width + 1 * Petal.Length
Petal.Length ~ Petal.Length.mean * 1
V1 := 1 * Petal.Length.mean + 1 * b1
V2 := 1 * Petal.Length.mean + 2 * b1
'
semFit <- sem(model = model,
data = iris, std.lv = TRUE)
## Compare emmeans
# From emmeans
test(
emmeans(semFit, ~ Petal.Width,
lavaan.DV = "LAT1",
at = list(Petal.Width = 1:2))
)
# From lavaan
parameterEstimates(semFit, output = "pretty")[15:16, ]
# Identical means.
# SEs differ due to lavaan estimating uncertainty of the mean
# of Petal.Length, whereas emmeans uses the mean as is.
#### Multi-Variate DV ####
model <- '
ind60 =~ x1 + x2 + x3
# metric invariance
dem60 =~ y1 + a*y2 + b*y3 + c*y4
dem65 =~ y5 + a*y6 + b*y7 + c*y8
# scalar invariance
y1 + y5 ~ d*1
y2 + y6 ~ e*1
y3 + y7 ~ f*1
y4 + y8 ~ g*1
# regressions (slopes differ: interaction with time)
dem60 ~ b1*ind60
dem65 ~ b2*ind60 + NA*1 + Mean.Diff*1
# residual correlations
y1 ~~ y5
y2 ~~ y4 + y6
y3 ~~ y7
y4 ~~ y8
y6 ~~ y8
# conditional mean differences (besides mean(ind60) == 0)
low := (-1*b2 + Mean.Diff) - (-1*b1) # 1 SD below M
high := (b2 + Mean.Diff) - b1 # 1 SD above M
'
semFit <- sem(model, data = PoliticalDemocracy)
## Compare contrasts
# From emmeans
emmeans(semFit, pairwise ~ rep.meas|ind60,
lavaan.DV = c("dem60","dem65"),
at = list(ind60 = c(-1,1)))[[2]]
# From lavaan
parameterEstimates(semFit, output = "pretty")[49:50, ]
#### Multi Group ####
model <- 'x1 ~ c(int1, int2)*1 + c(b1, b2)*ageyr
diff_11 := (int2 + b2*11) - (int1 + b1*11)
diff_13 := (int2 + b2*13) - (int1 + b1*13)
diff_15 := (int2 + b2*15) - (int1 + b1*15)
'
semFit <- sem(model, group = "school", data = HolzingerSwineford1939)
## Compare contrasts
# From emmeans (note `nesting = NULL`)
emmeans(semFit, pairwise ~ school | ageyr, lavaan.DV = "x1",
at = list(ageyr = c(11, 13, 15)), nesting = NULL)[[2]]
# From lavaan
parameterEstimates(semFit, output = "pretty")
#### Dealing with factors ####
warpbreaks <- cbind(warpbreaks,
model.matrix(~ wool + tension, data = warpbreaks))
model <- "
# Split for convenience
breaks ~ 1
breaks ~ woolB
breaks ~ tensionM + tensionH
breaks ~ woolB:tensionM + woolB:tensionH
"
semFit <- sem(model, warpbreaks)
## Compare contrasts
# From lm -> emmeans
lmFit <- lm(breaks ~ wool * tension, data = warpbreaks)
lmEM <- emmeans(lmFit, ~ tension + wool)
contrast(lmEM, method = data.frame(L_all = c(-1, .05, 0.5),
M_H = c(0, 1, -1)), by = "wool")
# From lavaan -> emmeans
lavEM <- emmeans(semFit, ~ tensionM + tensionH + woolB,
lavaan.DV = "breaks")
contrast(lavEM,
method = list(
"L_all|A" = c(c(-1, .05, 0.5, 0), rep(0, 4)),
"M_H |A" = c(c(0, 1, -1, 0), rep(0, 4)),
"L_all|A" = c(rep(0, 4), c(-1, .05, 0.5, 0)),
"M_H |A" = c(rep(0, 4), c(0, 1, -1, 0))
))
## End(Not run)
Find standardized factor loading from coefficient alpha
Description
Find standardized factor loading from coefficient alpha assuming that all items have equal loadings.
Usage
loadingFromAlpha(alpha, ni)
Arguments
alpha |
A desired coefficient alpha value. |
ni |
A desired number of items. |
Value
result |
The standardized factor loadings that make desired coefficient alpha with specified number of items. |
Author(s)
Sunthud Pornprasertmanit (psunthud@gmail.com)
Examples
loadingFromAlpha(0.8, 4)
Calculate Population Moments for Ordinal Data Treated as Numeric
Description
This function calculates ordinal-scale moments implied by LRV-scale moments
Usage
lrv2ord(Sigma, Mu, thresholds, cWts)
Arguments
Sigma |
Population covariance |
Mu |
Optional |
thresholds |
Either a single |
cWts |
Optional (default when missing is to use 0 for the lowest
category, followed by successive integers for each higher category).
Either a single |
Details
Binary and ordinal data are frequently accommodated in SEM by incorporating a threshold model that links each observed categorical response variable to a corresponding latent response variable that is typically assumed to be normally distributed (Kamata & Bauer, 2008; Wirth & Edwards, 2007). This function can be useful for real-data analysis or for designing Monte Carlo simulations, as described by Jorgensen and Johnson (2022).
Value
A list including the LRV-scale population moments (means,
covariance matrix, correlation matrix, and thresholds), the category
weights, a data.frame of implied univariate moments (means,
SDs, skewness, and excess kurtosis (i.e., in excess of 3, which is
the kurtosis of the normal distribution) for discretized data treated as
numeric, and the implied covariance and correlation matrix of
discretized data treated as numeric.
Author(s)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
Andrew R. Johnson (Curtin University; andrew.johnson@curtin.edu.au)
References
Jorgensen, T. D., & Johnson, A. R. (2022). How to derive expected values of structural equation model parameters when treating discrete data as continuous. Structural Equation Modeling, 29(4), 639–650. doi:10.1080/10705511.2021.1988609
Kamata, A., & Bauer, D. J. (2008). A note on the relation between factor analytic and item response theory models. Structural Equation Modeling, 15(1), 136–153. doi:10.1080/10705510701758406
Wirth, R. J., & Edwards, M. C. (2007). Item factor analysis: Current approaches and future directions. Psychological Methods, 12(1), 58–79. doi:10.1037/1082-989X.12.1.58
Examples
## SCENARIO 1: DIRECTLY SPECIFY POPULATION PARAMETERS
## specify population model in LISREL matrices
Nu <- rep(0, 4)
Alpha <- c(1, -0.5)
Lambda <- matrix(c(1, 1, 0, 0, 0, 0, 1, 1), nrow = 4, ncol = 2,
dimnames = list(paste0("y", 1:4), paste0("eta", 1:2)))
Psi <- diag(c(1, .75))
Theta <- diag(4)
Beta <- matrix(c(0, .5, 0, 0), nrow = 2, ncol = 2)
## calculate model-implied population means and covariance matrix
## of latent response variables (LRVs)
IB <- solve(diag(2) - Beta) # to save time and space
Mu_LRV <- Nu + Lambda %*% IB %*% Alpha
Sigma_LRV <- Lambda %*% IB %*% Psi %*% t(IB) %*% t(Lambda) + Theta
## Specify (unstandardized) thresholds to discretize normally distributed data
## generated from Mu_LRV and Sigma_LRV, based on marginal probabilities
PiList <- list(y1 = c(.25, .5, .25),
y2 = c(.17, .33, .33, .17),
y3 = c(.1, .2, .4, .2, .1),
## make final variable highly asymmetric
y4 = c(.33, .25, .17, .12, .08, .05))
sapply(PiList, sum) # all sum to 100%
CumProbs <- sapply(PiList, cumsum)
## unstandardized thresholds
TauList <- mapply(qnorm, p = lapply(CumProbs, function(x) x[-length(x)]),
m = Mu_LRV, sd = sqrt(diag(Sigma_LRV)))
for (i in 1:4) names(TauList[[i]]) <- paste0(names(TauList)[i], "|t",
1:length(TauList[[i]]))
## assign numeric weights to each category (optional, see default)
NumCodes <- list(y1 = c(-0.5, 0, 0.5), y2 = 0:3, y3 = 1:5, y4 = 1:6)
## Calculate Population Moments for Numerically Coded Ordinal Variables
lrv2ord(Sigma = Sigma_LRV, Mu = Mu_LRV, thresholds = TauList, cWts = NumCodes)
## SCENARIO 2: USE ESTIMATED PARAMETERS AS POPULATION
data(datCat) # already stored as c("ordered","factor")
fit <- cfa(' f =~ 1*u1 + 1*u2 + 1*u3 + 1*u4 ', data = datCat)
lrv2ord(Sigma = fit, thresholds = fit) # use same fit for both
## or use estimated thresholds with specified parameters, but note that
## lrv2ord() will only extract standardized thresholds
dimnames(Sigma_LRV) <- list(paste0("u", 1:4), paste0("u", 1:4))
lrv2ord(Sigma = cov2cor(Sigma_LRV), thresholds = fit)
Finding Mardia's multivariate kurtosis
Description
Finding Mardia's multivariate kurtosis of multiple variables
Usage
mardiaKurtosis(dat, use = "everything")
Arguments
dat |
The target matrix or data frame with multiple variables |
use |
Missing data handling method from the |
Details
The Mardia's multivariate kurtosis formula (Mardia, 1970) is
b_{2, d} = \frac{1}{n}\sum^n_{i=1}\left[ \left(\bold{X}_i -
\bold{\bar{X}} \right)^{'} \bold{S}^{-1} \left(\bold{X}_i -
\bold{\bar{X}} \right) \right]^2,
where d is the number of variables, X is the target
dataset with multiple variables, n is the sample size, \bold{S}
is the sample covariance matrix of the target dataset, and
\bold{\bar{X}} is the mean vectors of the target dataset binded in
n rows. When the population multivariate kurtosis is normal, the
b_{2,d} is asymptotically distributed as normal distribution with the
mean of d(d + 2) and variance of 8d(d + 2)/n.
Value
A value of a Mardia's multivariate kurtosis with a test statistic
Author(s)
Sunthud Pornprasertmanit (psunthud@gmail.com)
References
Mardia, K. V. (1970). Measures of multivariate skewness and kurtosis with applications. Biometrika, 57(3), 519–530. doi:10.2307/2334770
See Also
-
skew()Find the univariate skewness of a variable -
kurtosis()Find the univariate excessive kurtosis of a variable -
mardiaSkew()Find the Mardia's multivariate skewness of a set of variables
Examples
library(lavaan)
mardiaKurtosis(HolzingerSwineford1939[ , paste0("x", 1:9)])
Finding Mardia's multivariate skewness
Description
Finding Mardia's multivariate skewness of multiple variables
Usage
mardiaSkew(dat, use = "everything")
Arguments
dat |
The target matrix or data frame with multiple variables |
use |
Missing data handling method from the |
Details
The Mardia's multivariate skewness formula (Mardia, 1970) is
b_{1, d} = \frac{1}{n^2}\sum^n_{i=1}\sum^n_{j=1}\left[
\left(\bold{X}_i - \bold{\bar{X}} \right)^{'} \bold{S}^{-1}
\left(\bold{X}_j - \bold{\bar{X}} \right) \right]^3,
where d is the number of variables, X is the target dataset
with multiple variables, n is the sample size, \bold{S} is
the sample covariance matrix of the target dataset, and \bold{\bar{X}}
is the mean vectors of the target dataset binded in n rows.
When the population multivariate skewness is normal, the
\frac{n}{6}b_{1,d} is asymptotically distributed as \chi^2
distribution with d(d + 1)(d + 2)/6 degrees of freedom.
Value
A value of a Mardia's multivariate skewness with a test statistic
Author(s)
Sunthud Pornprasertmanit (psunthud@gmail.com)
References
Mardia, K. V. (1970). Measures of multivariate skewness and kurtosis with applications. Biometrika, 57(3), 519–530. doi:10.2307/2334770
See Also
-
skew()Find the univariate skewness of a variable -
kurtosis()Find the univariate excessive kurtosis of a variable -
mardiaKurtosis()Find the Mardia's multivariate kurtosis of a set of variables
Examples
library(lavaan)
mardiaSkew(HolzingerSwineford1939[ , paste0("x", 1:9)])
Calculate maximal reliability
Description
Calculate maximal reliability of a scale
Usage
maximalRelia(object, omit.imps = c("no.conv", "no.se"))
Arguments
object |
A lavaan::lavaan or lavaan.mi::lavaan.mi object, expected to contain only exogenous common factors (i.e., a CFA model). |
omit.imps |
|
Details
Given that a composite score (W) is a weighted sum of item scores:
W = \bold{w}^\prime \bold{x} ,
where \bold{x} is a k \times 1 vector of the scores of each
item, \bold{w} is a k \times 1 weight vector of each item, and
k represents the number of items. Then, maximal reliability is
obtained by finding \bold{w} such that reliability attains its maximum
(Li, 1997; Raykov, 2012). Note that the reliability can be obtained by
\rho = \frac{\bold{w}^\prime \bold{S}_T \bold{w}}{\bold{w}^\prime
\bold{S}_X \bold{w}}
where \bold{S}_T is the covariance matrix explained by true scores and
\bold{S}_X is the observed covariance matrix. Numerical method is used
to find \bold{w} in this function.
For continuous items, \bold{S}_T can be calculated by
\bold{S}_T = \Lambda \Psi \Lambda^\prime,
where \Lambda is the factor loading matrix and \Psi is the
covariance matrix among factors. \bold{S}_X is directly obtained by
covariance among items.
For categorical items, Green and Yang's (2009) method is used for
calculating \bold{S}_T and \bold{S}_X. The element i and
j of \bold{S}_T can be calculated by
\left[\bold{S}_T\right]_{ij} = \sum^{C_i - 1}_{c_i = 1} \sum^{C_j -
1}_{c_j - 1} \Phi_2\left( \tau_{x_{c_i}}, \tau_{x_{c_j}}, \left[ \Lambda
\Psi \Lambda^\prime \right]_{ij} \right) - \sum^{C_i - 1}_{c_i = 1}
\Phi_1(\tau_{x_{c_i}}) \sum^{C_j - 1}_{c_j - 1} \Phi_1(\tau_{x_{c_j}}),
where C_i and C_j represents the number of thresholds in Items
i and j, \tau_{x_{c_i}} represents the threshold c_i
of Item i, \tau_{x_{c_j}} represents the threshold c_i of
Item j, \Phi_1(\tau_{x_{c_i}}) is the cumulative probability of
\tau_{x_{c_i}} given a univariate standard normal cumulative
distribution and \Phi_2\left( \tau_{x_{c_i}}, \tau_{x_{c_j}}, \rho
\right) is the joint cumulative probability of \tau_{x_{c_i}} and
\tau_{x_{c_j}} given a bivariate standard normal cumulative
distribution with a correlation of \rho
Each element of \bold{S}_X can be calculated by
\left[\bold{S}_T\right]_{ij} = \sum^{C_i - 1}_{c_i = 1} \sum^{C_j -
1}_{c_j - 1} \Phi_2\left( \tau_{V_{c_i}}, \tau_{V_{c_j}}, \rho^*_{ij}
\right) - \sum^{C_i - 1}_{c_i = 1} \Phi_1(\tau_{V_{c_i}}) \sum^{C_j -
1}_{c_j - 1} \Phi_1(\tau_{V_{c_j}}),
where \rho^*_{ij} is a polychoric correlation between Items i
and j.
Value
Maximal reliability values of each group. The maximal-reliability
weights are also provided. Users may extracted the weighted by the
attr function (see example below).
Author(s)
Sunthud Pornprasertmanit (psunthud@gmail.com)
References
Li, H. (1997). A unifying expression for the maximal reliability of a linear composite. Psychometrika, 62(2), 245–249. doi:10.1007/BF02295278
Raykov, T. (2012). Scale construction and development using structural equation modeling. In R. H. Hoyle (Ed.), Handbook of structural equation modeling (pp. 472–494). New York, NY: Guilford.
See Also
reliability() for reliability of an unweighted
composite score
Examples
total <- 'f =~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 '
fit <- cfa(total, data = HolzingerSwineford1939)
maximalRelia(fit)
# Extract the weight
mr <- maximalRelia(fit)
attr(mr, "weight")
Syntax for measurement equivalence
Description
Automatically generates lavaan model syntax to specify a confirmatory
factor analysis (CFA) model with equality constraints imposed on
user-specified measurement (or structural) parameters. Optionally returns
the fitted model (if data are provided) representing some chosen level of
measurement equivalence/invariance across groups and/or repeated measures.
Usage
measEq.syntax(configural.model, ..., ID.fac = "std.lv",
ID.cat = "Wu.Estabrook.2016", ID.thr = c(1L, 2L), group = NULL,
group.equal = "", group.partial = "", longFacNames = list(),
longIndNames = list(), long.equal = "", long.partial = "",
auto = "all", warn = TRUE, debug = FALSE, return.fit = FALSE)
Arguments
configural.model |
A model with no measurement-invariance constraints
(i.e., representing only configural invariance), unless required for model
identification.
Note that the specified or fitted model must not contain any latent structural parameters (i.e., it must be a CFA model), unless they are higher-order constructs with latent indicators (i.e., a second-order CFA). |
... |
Additional arguments (e.g., |
ID.fac |
See Kloessner & Klopp (2019) for details about all three methods. |
ID.cat |
See Details and References for more information. |
ID.thr |
|
group |
optional |
group.equal |
optional |
group.partial |
optional |
longFacNames |
optional named |
longIndNames |
optional named |
long.equal |
optional |
long.partial |
optional |
auto |
Used to automatically included autocorrelated measurement errors
among repeatedly measured indicators in |
warn, debug |
|
return.fit |
|
Details
This function is a pedagogical and analytical tool to generate model syntax representing some level of measurement equivalence/invariance across any combination of multiple groups and/or repeated measures. Support is provided for confirmatory factor analysis (CFA) models with simple or complex structure (i.e., cross-loadings and correlated residuals are allowed). For any complexities that exceed the limits of automation, this function is intended to still be useful by providing a means to generate syntax that users can easily edit to accommodate their unique situations.
Limited support is provided for bifactor models and higher-order constructs.
Because bifactor models have cross-loadings by definition, the option
ID.fac = "effects.code" is unavailable. ID.fac = "UV" is
recommended for bifactor models, but ID.fac = "UL" is available on
the condition that each factor has a unique first indicator in the
configural.model. In order to maintain generality, higher-order
factors may include a mix of manifest and latent indicators, but they must
therefore require ID.fac = "UL" to avoid complications with
differentiating lower-order vs. higher-order (or mixed-level) factors.
The keyword "loadings" in group.equal or long.equal
constrains factor loadings of all manifest indicators (including loadings on
higher-order factors that also have latent indicators), whereas the keyword
"regressions" constrains factor loadings of latent indicators. Users
can edit the model syntax manually to adjust constraints as necessary, or
clever use of the group.partial or long.partial arguments
could make it possible for users to still automated their model syntax.
The keyword "intercepts" constrains the intercepts of all manifest
indicators, and the keyword "means" constrains intercepts and means
of all latent common factors, regardless of whether they are latent
indicators of higher-order factors. To test equivalence of lower-order and
higher-order intercepts/means in separate steps, the user can either
manually edit their generated syntax or conscientiously exploit the
group.partial or long.partial arguments as necessary.
ID.fac: If the configural.model fixes any (e.g.,
the first) factor loadings, the generated syntax object will retain those
fixed values. This allows the user to retain additional constraints that
might be necessary (e.g., if there are only 1 or 2 indicators). Some methods
must be used in conjunction with other settings:
-
ID.cat = "Millsap"requiresID.fac = "UL"andparameterization = "theta". -
ID.cat = "LISREL"requiresparameterization = "theta". -
ID.fac = "effects.code"is unavailable when there are any cross-loadings.
ID.cat: Wu & Estabrook (2016) recommended constraining
thresholds to equality first, and doing so should allow releasing any
identification constraints no longer needed. For each ordered
indicator, constraining one threshold to equality will allow the item's
intercepts to be estimated in all but the first group or repeated measure.
Constraining a second threshold (if applicable) will allow the item's
(residual) variance to be estimated in all but the first group or repeated
measure. For binary data, there is no independent test of threshold,
intercept, or residual-variance equality. Equivalence of thresholds must
also be assumed for three-category indicators. These guidelines provide the
least restrictive assumptions and tests, and are therefore the default.
The default setting in Mplus is similar to Wu & Estabrook (2016),
except that intercepts are always constrained to zero (so they are assumed
to be invariant without testing them). Millsap & Tein (2004) recommended
parameterization = "theta" and identified an item's residual variance
in all but the first group (or occasion; Liu et al., 2017) by constraining
its intercept to zero and one of its thresholds to equality. A second
threshold for the reference indicator (so ID.fac = "UL") is used to
identify the common-factor means in all but the first group/occasion. The
LISREL software fixes the first threshold to zero and (if applicable) the
second threshold to 1, and assumes any remaining thresholds to be equal
across groups / repeated measures; thus, the intercepts are always
identified, and residual variances (parameterization = "theta") are
identified except for binary data, when they are all fixed to one.
Repeated Measures: If each repeatedly measured factor is measured
by the same indicators (specified in the same order in the
configural.model) on each occasion, without any cross-loadings, the
user can let longIndNames be automatically generated. Generic names
for the repeatedly measured indicators are created using the name of the
repeatedly measured factors (i.e., names(longFacNames)) and the
number of indicators. So the repeatedly measured first indicator
("ind") of a longitudinal construct called "factor" would be
generated as "._factor_ind.1".
The same types of parameter can be specified for long.equal as for
group.equal (see lavaan::lavOptions() for a list), except
for "residual.covariances" or "lv.covariances". Instead, users
can constrain autocovariances using keywords "resid.autocov"
or "lv.autocov". Note that group.equal = "lv.covariances" or
group.equal = "residual.covariances" will constrain any
autocovariances across groups, along with any other covariances the user
specified in the configural.model. Note also that autocovariances
cannot be specified as exceptions in long.partial, so anything more
complex than the auto argument automatically provides should instead
be manually specified in the configural.model.
When users set orthogonal=TRUE in the configural.model (e.g.,
in bifactor models of repeatedly measured constructs), autocovariances of
each repeatedly measured factor will still be freely estimated in the
generated syntax.
Missing Data: If users wish to utilize the auxiliary()
function to automatically include auxiliary variables in conjunction with
missing = "FIML", they should first generate the hypothesized-model
syntax, then submit that syntax as the model to auxiliary().
If users utilized lavaan.mi::lavaan.mi() to fit their configural.model
to multiply imputed data, that model can also be passed to the
configural.model argument, and if return.fit = TRUE, the
generated model will be fitted to the multiple imputations.
Value
By default, an object of class measEq.syntax.
If return.fit = TRUE, a fitted lavaan::lavaan()
model, with the measEq.syntax object stored in the
@external slot, accessible by fit@external$measEq.syntax.
Author(s)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
References
Kloessner, S., & Klopp, E. (2019). Explaining constraint interaction: How to interpret estimated model parameters under alternative scaling methods. Structural Equation Modeling, 26(1), 143–155. doi:10.1080/10705511.2018.1517356
Liu, Y., Millsap, R. E., West, S. G., Tein, J.-Y., Tanaka, R., & Grimm, K. J. (2017). Testing measurement invariance in longitudinal data with ordered-categorical measures. Psychological Methods, 22(3), 486–506. doi:10.1037/met0000075
Millsap, R. E., & Tein, J.-Y. (2004). Assessing factorial invariance in ordered-categorical measures. Multivariate Behavioral Research, 39(3), 479–515. doi:10.1207/S15327906MBR3903_4
Wu, H., & Estabrook, R. (2016). Identification of confirmatory factor analysis models of different levels of invariance for ordered categorical outcomes. Psychometrika, 81(4), 1014–1045. doi:10.1007/s11336-016-9506-0
See Also
Examples
mod.cat <- ' FU1 =~ u1 + u2 + u3 + u4
FU2 =~ u5 + u6 + u7 + u8 '
## the 2 factors are actually the same factor (FU) measured twice
longFacNames <- list(FU = c("FU1","FU2"))
## CONFIGURAL model: no constraints across groups or repeated measures
syntax.config <- measEq.syntax(configural.model = mod.cat,
# NOTE: data provides info about numbers of
# groups and thresholds
data = datCat,
ordered = paste0("u", 1:8),
parameterization = "theta",
ID.fac = "std.lv", ID.cat = "Wu.Estabrook.2016",
group = "g", longFacNames = longFacNames)
## print lavaan syntax to the Console
cat(as.character(syntax.config))
## print a summary of model features
summary(syntax.config)
## THRESHOLD invariance:
## only necessary to specify thresholds if you have no data
mod.th <- '
u1 | t1 + t2 + t3 + t4
u2 | t1 + t2 + t3 + t4
u3 | t1 + t2 + t3 + t4
u4 | t1 + t2 + t3 + t4
u5 | t1 + t2 + t3 + t4
u6 | t1 + t2 + t3 + t4
u7 | t1 + t2 + t3 + t4
u8 | t1 + t2 + t3 + t4
'
syntax.thresh <- measEq.syntax(configural.model = c(mod.cat, mod.th),
# NOTE: data not provided, so syntax must
# include thresholds, and number of
# groups == 2 is indicated by:
sample.nobs = c(1, 1),
parameterization = "theta",
ID.fac = "std.lv", ID.cat = "Wu.Estabrook.2016",
group = "g", group.equal = "thresholds",
longFacNames = longFacNames,
long.equal = "thresholds")
## notice that constraining 4 thresholds allows intercepts and residual
## variances to be freely estimated in all but the first group & occasion
cat(as.character(syntax.thresh))
## print a summary of model features
summary(syntax.thresh)
## Fit a model to the data either in a subsequent step (recommended):
mod.config <- as.character(syntax.config)
fit.config <- cfa(mod.config, data = datCat, group = "g",
ordered = paste0("u", 1:8), parameterization = "theta")
## or in a single step (not generally recommended):
fit.thresh <- measEq.syntax(configural.model = mod.cat, data = datCat,
ordered = paste0("u", 1:8),
parameterization = "theta",
ID.fac = "std.lv", ID.cat = "Wu.Estabrook.2016",
group = "g", group.equal = "thresholds",
longFacNames = longFacNames,
long.equal = "thresholds", return.fit = TRUE)
## compare their fit to test threshold invariance
anova(fit.config, fit.thresh)
## --------------------------------------------------------
## RECOMMENDED PRACTICE: fit one invariance model at a time
## --------------------------------------------------------
## - A downside of setting return.fit=TRUE is that if the model has trouble
## converging, you don't have the opportunity to investigate the syntax,
## or even to know whether an error resulted from the syntax-generator or
## from lavaan itself.
## - A downside of automatically fitting an entire set of invariance models
## (like the old measurementInvariance() function did) is that you might
## end up testing models that shouldn't even be fitted because less
## restrictive models already fail (e.g., don't test full scalar
## invariance if metric invariance fails! Establish partial metric
## invariance first, then test equivalent of intercepts ONLY among the
## indicators that have invariate loadings.)
## The recommended sequence is to (1) generate and save each syntax object,
## (2) print it to the screen to verify you are fitting the model you expect
## to (and potentially learn which identification constraints should be
## released when equality constraints are imposed), and (3) fit that model
## to the data, as you would if you had written the syntax yourself.
## Continuing from the examples above, after establishing invariance of
## thresholds, we proceed to test equivalence of loadings and intercepts
## (metric and scalar invariance, respectively)
## simultaneously across groups and repeated measures.
## metric invariance
syntax.metric <- measEq.syntax(configural.model = mod.cat, data = datCat,
ordered = paste0("u", 1:8),
parameterization = "theta",
ID.fac = "std.lv", ID.cat = "Wu.Estabrook.2016",
group = "g", longFacNames = longFacNames,
group.equal = c("thresholds","loadings"),
long.equal = c("thresholds","loadings"))
summary(syntax.metric) # summarize model features
mod.metric <- as.character(syntax.metric) # save as text
cat(mod.metric) # print/view lavaan syntax
## fit model to data
fit.metric <- cfa(mod.metric, data = datCat, group = "g",
ordered = paste0("u", 1:8), parameterization = "theta")
## test equivalence of loadings, given equivalence of thresholds
anova(fit.thresh, fit.metric)
## scalar invariance
syntax.scalar <- measEq.syntax(configural.model = mod.cat, data = datCat,
ordered = paste0("u", 1:8),
parameterization = "theta",
ID.fac = "std.lv", ID.cat = "Wu.Estabrook.2016",
group = "g", longFacNames = longFacNames,
group.equal = c("thresholds","loadings",
"intercepts"),
long.equal = c("thresholds","loadings",
"intercepts"))
summary(syntax.scalar) # summarize model features
mod.scalar <- as.character(syntax.scalar) # save as text
cat(mod.scalar) # print/view lavaan syntax
## fit model to data
fit.scalar <- cfa(mod.scalar, data = datCat, group = "g",
ordered = paste0("u", 1:8), parameterization = "theta")
## test equivalence of intercepts, given equal thresholds & loadings
anova(fit.metric, fit.scalar)
## For a single table with all results, you can pass the models to
## summarize to the compareFit() function
Comparisons <- compareFit(fit.config, fit.thresh, fit.metric, fit.scalar)
summary(Comparisons)
## ------------------------------------------------------
## NOT RECOMMENDED: fit several invariance models at once
## ------------------------------------------------------
test.seq <- c("thresholds","loadings","intercepts","means","residuals")
meq.list <- list()
for (i in 0:length(test.seq)) {
if (i == 0L) {
meq.label <- "configural"
group.equal <- ""
long.equal <- ""
} else {
meq.label <- test.seq[i]
group.equal <- test.seq[1:i]
long.equal <- test.seq[1:i]
}
meq.list[[meq.label]] <- measEq.syntax(configural.model = mod.cat,
data = datCat,
ordered = paste0("u", 1:8),
parameterization = "theta",
ID.fac = "std.lv",
ID.cat = "Wu.Estabrook.2016",
group = "g",
group.equal = group.equal,
longFacNames = longFacNames,
long.equal = long.equal,
return.fit = TRUE)
}
evalMeasEq <- compareFit(meq.list)
summary(evalMeasEq)
## -----------------
## Binary indicators
## -----------------
## borrow example data from Mplus user guide
myData <- read.table("https://www.statmodel.com/usersguide/chap5/ex5.16.dat")
names(myData) <- c("u1","u2","u3","u4","u5","u6","x1","x2","x3","g")
bin.mod <- '
FU1 =~ u1 + u2 + u3
FU2 =~ u4 + u5 + u6
'
## Must SIMULTANEOUSLY constrain thresholds, loadings, and intercepts
test.seq <- list(strong = c("thresholds","loadings","intercepts"),
means = "means",
strict = "residuals")
meq.list <- list()
for (i in 0:length(test.seq)) {
if (i == 0L) {
meq.label <- "configural"
group.equal <- ""
long.equal <- ""
} else {
meq.label <- names(test.seq)[i]
group.equal <- unlist(test.seq[1:i])
# long.equal <- unlist(test.seq[1:i])
}
meq.list[[meq.label]] <- measEq.syntax(configural.model = bin.mod,
data = myData,
ordered = paste0("u", 1:6),
parameterization = "theta",
ID.fac = "std.lv",
ID.cat = "Wu.Estabrook.2016",
group = "g",
group.equal = group.equal,
#longFacNames = longFacNames,
#long.equal = long.equal,
return.fit = TRUE)
}
evalMeasEq <- compareFit(meq.list)
summary(evalMeasEq)
## ---------------------
## Multilevel Invariance
## ---------------------
## To test invariance across levels in a MLSEM, specify syntax as though
## you are fitting to 2 groups instead of 2 levels.
mlsem <- ' f1 =~ y1 + y2 + y3
f2 =~ y4 + y5 + y6 '
## metric invariance
syntax.metric <- measEq.syntax(configural.model = mlsem, meanstructure = TRUE,
ID.fac = "std.lv", sample.nobs = c(1, 1),
group = "cluster", group.equal = "loadings")
## by definition, Level-1 means must be zero, so fix them
syntax.metric <- update(syntax.metric,
change.syntax = paste0("y", 1:6, " ~ c(0, NA)*1"))
## save as a character string
mod.metric <- as.character(syntax.metric, groups.as.blocks = TRUE)
## convert from multigroup to multilevel
mod.metric <- gsub(pattern = "group:", replacement = "level:",
x = mod.metric, fixed = TRUE)
## fit model to data
fit.metric <- lavaan(mod.metric, data = Demo.twolevel, cluster = "cluster")
summary(fit.metric)
Class for Representing a Measurement-Equivalence Model
Description
This class of object stores information used to automatically generate
lavaan model syntax to represent user-specified levels of measurement
equivalence/invariance across groups and/or repeated measures. See
measEq.syntax() for details.
Usage
## S4 method for signature 'measEq.syntax'
as.character(x, package = "lavaan",
params = NULL, single = TRUE, groups.as.blocks = FALSE)
## S4 method for signature 'measEq.syntax'
show(object)
## S4 method for signature 'measEq.syntax'
summary(object, verbose = TRUE)
## S4 method for signature 'measEq.syntax'
update(object, ..., evaluate = TRUE,
change.syntax = NULL)
Arguments
x, object |
an object of class |
package |
|
params |
|
single |
|
groups.as.blocks |
|
verbose |
|
... |
Additional arguments to the |
evaluate |
If |
change.syntax |
|
Value
summary |
|
show |
|
update |
|
as.character |
|
Slots
packagecharacterindicating the software package used to represent the model. Currently, only"lavaan"is available, which uses the LISREL representation (seelavaan::lavOptions()). In the future,"OpenMx"may become available, using RAM representation.model.typecharacter. Currently, only "cfa" is available. Future versions may allow for MIMIC / RFA models, where invariance can be tested across levels of exogenous variables explicitly included as predictors of indicators, controlling for their effects on (or correlation with) the common factors.callThe function call as returned by
match.call(), with some arguments updated if necessary for logical consistency.meanstructurelogicalindicating whether a mean structure is included in the model.numericcharactervector namingnumericmanifest indicators.orderedcharactervector namingorderedindicators.parameterizationcharacter. Seelavaan::lavOptions().specifylistof parameter matrices, similar in form to the output oflavInspect(fit, "free"). These matrices arelogical, indicating whether each parameter should be specified in the model syntax.valueslistof parameter matrices, similar in form to the output oflavInspect(fit, "free"). These matrices arenumeric, indicating whether each parameter should be freely estimated (indicated byNA) or fixed to a particular value.labelslistof parameter matrices, similar in form to the output oflavInspect(fit, "free"). These matrices containcharacterlabels used to constrain parameters to equality.constraintscharactervector containing additional equality constraints used to identify the model whenID.fac = "fx".ngroupsintegerindicating the number of groups.
Author(s)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
Examples
## See ?measEq.syntax help page for examples using lavaan
Monte Carlo Confidence Intervals to Test Functions of Parameter Estimates
Description
Robust confidence intervals for functions of parameter estimates, based on empirical sampling distributions of estimated model parameters.
Usage
monteCarloCI(object = NULL, expr, coefs, ACM, nRep = 20000,
standardized = FALSE, fast = TRUE, level = 0.95, na.rm = TRUE,
append.samples = FALSE, plot = FALSE,
ask = getOption("device.ask.default"), ...)
Arguments
object |
A object of class lavaan::lavaan in which
functions of parameters have already been defined using the |
expr |
Optional |
coefs |
|
ACM |
Symmetric |
nRep |
|
standardized |
|
fast |
|
level |
|
na.rm |
|
append.samples |
|
plot |
|
ask |
whether to prompt user before printing each plot |
... |
arguments passed to |
Details
This function implements the Monte Carlo method of obtaining an empirical
sampling distribution of estimated model parameters, as described by
MacKinnon et al. (2004) for testing indirect effects in mediation models.
This is essentially a parametric bootstrap method, which (re)samples
parameters (rather than raw data) from a multivariate-normal distribution
with mean vector equal to estimates in coef() and covariance matrix
equal to the asymptotic covariance matrix vcov() of estimated parameters.
The easiest way to use the function is to fit a SEM to data with
lavaan::lavaan(), using the := operator in the
lavaan::model.syntax() to specify user-defined parameters.
All information is then available in the resulting
lavaan::lavaan object. Alternatively (especially when using
external SEM software to fit the model), the expression(s) can be explicitly
passed to the function, along with the vector of estimated model parameters
and their associated asymptotic sampling covariance matrix (ACOV).
For further information on the Monte Carlo method, see MacKinnon et al.
(2004) and Preacher & Selig (2012).
The asymptotic covariance matrix can be obtained easily from many popular SEM software packages.
LISREL: Including the EC option on the OU line will print the ACM to a seperate file. The file contains the lower triangular elements of the ACM in free format and scientific notation.
Mplus: Include the command TECH3; in the OUTPUT section. The ACM will be printed in the output.
lavaan: Use thevcov()method on the fitted lavaan::lavaan object to return the ACM.
Value
A lavaan.data.frame (to use lavaan's print method)
with point estimates and confidence limits of each requested function of
parameters in expr is returned. If append.samples = TRUE,
output will be a list with the same $Results along with a
second data.frame with the $Samples (in rows) of each
parameter (in columns), and an additional column for each requested
function of those parameters.
Author(s)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
References
MacKinnon, D. P., Lockwood, C. M., & Williams, J. (2004). Confidence limits for the indirect effect: Distribution of the product and resampling methods. Multivariate Behavioral Research, 39(1) 99–128. doi:10.1207/s15327906mbr3901_4
Preacher, K. J., & Selig, J. P. (2010, July). Monte Carlo method for assessing multilevel mediation: An interactive tool for creating confidence intervals for indirect effects in 1-1-1 multilevel models. Computer software available from https://quantpsy.org/.
Preacher, K. J., & Selig, J. P. (2012). Advantages of Monte Carlo confidence intervals for indirect effects. Communication Methods and Measures, 6(2), 77–98. doi:10.1080/19312458.2012.679848
Selig, J. P., & Preacher, K. J. (2008, June). Monte Carlo method for assessing mediation: An interactive tool for creating confidence intervals for indirect effects. Computer software available from https://quantpsy.org/.
Examples
## From the mediation tutorial:
## https://lavaan.ugent.be/tutorial/mediation.html
set.seed(1234)
X <- rnorm(100)
M <- 0.5*X + rnorm(100)
Y <- 0.7*M + rnorm(100)
dat <- data.frame(X = X, Y = Y, M = M)
mod <- ' # direct effect
Y ~ c*X
# mediator
M ~ a*X
Y ~ b*M
# indirect effect (a*b)
ind := a*b
# total effect
total := ind + c
'
fit <- sem(mod, data = dat)
summary(fit, ci = TRUE) # print delta-method CIs
## Automatically extract information from lavaan object
set.seed(1234)
monteCarloCI(fit) # CIs more robust than delta method in smaller samples
## delta method for standardized solution
standardizedSolution(fit)
## compare to Monte Carlo CIs:
set.seed(1234)
monteCarloCI(fit, standardized = TRUE)
## save samples to calculate more precise intervals:
set.seed(1234)
foo <- monteCarloCI(fit, append.samples = TRUE)
# library(HDInterval) # not a dependency; must be installed
# hdi(foo$Samples)
## Parameters can also be obtained from an external analysis
myParams <- c("a","b","c")
(coefs <- coef(fit)[myParams]) # names must match those in the "expression"
## Asymptotic covariance matrix from an external analysis
(AsyCovMat <- vcov(fit)[myParams, myParams])
## Compute CI, include a plot
set.seed(1234)
monteCarloCI(expr = c(ind = 'a*b', total = 'ind + c',
## other arbitrary functions are also possible
meaningless = 'sqrt(a)^b / log(abs(c))'),
coefs = coefs, ACM = AsyCovMat,
plot = TRUE, ask = TRUE) # print a plot for each
Calculate more fit indices
Description
Calculate more fit indices that are not already provided in lavaan.
Usage
moreFitIndices(object, fit.measures = "all", nPrior = 1)
Arguments
object |
The lavaan model object provided after running the |
fit.measures |
Additional fit measures to be calculated. All additional fit measures are calculated by default |
nPrior |
The sample size on which prior is based. This argument is used
to compute |
Details
See nullRMSEA() for the further details of the computation of
RMSEA of the null model.
Gamma-Hat (gammaHat; West, Taylor, & Wu, 2012) is a global
goodness-of-fit index which can be computed (assuming equal number of
indicators across groups) by
\hat{\Gamma} =\frac{p}{p + 2 \times \frac{\chi^{2}_{k} - df_{k}}{N}},
where p is the number of variables in the model, \chi^{2}_{k} is
the \chi^2 test statistic value of the target model, df_{k} is
the degree of freedom when fitting the target model, and N is the
sample size (or sample size minus the number of groups if mimic is
set to "EQS").
Adjusted Gamma-Hat (adjGammaHat; West, Taylor, & Wu, 2012) is a
global fit index which can be computed by
\hat{\Gamma}_\textrm{adj} = \left(1 - \frac{K \times p \times
(p + 1)}{2 \times df_{k}} \right) \times \left( 1 - \hat{\Gamma} \right),
where K is the number of groups (please refer to Dudgeon, 2004, for
the multiple-group adjustment for adjGammaHat).
Note that if Satorra–Bentler's or Yuan–Bentler's method is used, the fit
indices using the scaled \chi^2 values are also provided.
The remaining indices are information criteria calculated using the
object's -2 \times log-likelihood, abbreviated -2LL.
Corrected Akaike Information Criterion (aic.smallN; Burnham &
Anderson, 2003) is a corrected version of AIC for small sample size, often
abbreviated AICc:
\textrm{AIC}_{\textrm{small}-N} = AIC + \frac{2q(q + 1)}{N - q - 1},
where AIC is the original AIC: -2LL + 2q (where q
= the number of estimated parameters in the target model). Note that AICc is
a small-sample correction derived for univariate regression models, so it is
probably not appropriate for comparing SEMs.
Corrected Bayesian Information Criterion (bic.priorN; Kuha, 2004) is
similar to BIC but explicitly specifying the sample size on which the prior
is based (N_{prior}) using the nPrior argument.
\textrm{BIC}_{\textrm{prior}-N} = -2LL + q\log{( 1 + \frac{N}{N_{prior}} )}.
Bollen et al. (2012, 2014) discussed additional BICs that incorporate more
terms from a Taylor series expansion, which the standard BIC drops. The
"Scaled Unit-Information Prior" BIC is calculated depending on whether the
product of the vector of estimated model parameters (\hat{\theta}) and
the observed information matrix (FIM) exceeds the number of estimated model
parameters (Case 1) or not (Case 2), which is checked internally:
\textrm{SPBIC}_{\textrm{Case 1}} = -2LL + q(1 - \frac{q}{\hat{\theta}^{'} \textrm{FIM} \hat{\theta}}), \textrm{ or}
\textrm{SPBIC}_{\textrm{Case 2}} = -2LL + \hat{\theta}^{'} \textrm{FIM} \hat{\theta},
Note that this implementation of SPBIC is calculated on the assumption that priors for all estimated parameters are centered at zero, which is inappropriate for most SEMs (e.g., variances should not have priors centered at the lowest possible value; Bollen, 2014, p. 6).
Bollen et al. (2014, eq. 14) credit the HBIC to Haughton (1988):
\textrm{HBIC} = -2LL + q\log{\frac{N}{2 \pi}}.
Bollen et al. (2012, p. 305) proposed the information matrix (I)-based BIC by
adding another term:
\textrm{IBIC} = -2LL + q\log{\frac{N}{2 \pi}} + \log{\det{\textrm{FIM}}},
or equivalently, using the inverse information (the asymptotic sampling covariance matrix of estimated parameters: ACOV):
\textrm{IBIC} = -2LL - q\log{2 \pi} - \log{\det{\textrm{ACOV}}}.
Stochastic information criterion (SIC; see Preacher, 2006, for details) is
similar to IBIC but does not include the q\log{2 \pi} term that is
also in HBIC. SIC and IBIC both account for model complexity in a model's
functional form, not merely the number of free parameters. The SIC can be
computed as:
\textrm{SIC} = -2LL + q\log{N} + \log{\det{\textrm{FIM}}} = -2LL - \log{\det{\textrm{ACOV}}}.
Hannan–Quinn Information Criterion (HQC; Hannan & Quinn, 1979) is used for model selection, similar to AIC or BIC.
\textrm{HQC} = -2LL + 2q\log{(\log{N})},
Bozdogan Information Complexity (ICOMP) Criteria (Howe et al., 2011), instead of penalizing the number of free parameters directly, ICOMP penalizes the covariance complexity of the model.
\textrm{ICOMP} = -2LL + s \times log(\frac{\bar{\lambda_a}}{\bar{\lambda_g}})
Value
A numeric lavaan.vector including any of the
following requested via fit.measures=
-
gammaHat: Gamma-Hat -
adjGammaHat: Adjusted Gamma-Hat -
baseline.rmsea: RMSEA of the default baseline (i.e., independence) model -
gammaHat.scaled: Gamma-Hat using scaled\chi^2 -
adjGammaHat.scaled: Adjusted Gamma-Hat using scaled\chi^2 -
baseline.rmsea.scaled: RMSEA of the default baseline (i.e., independence) model using scaled\chi^2 -
aic.smallN: Corrected (for small sample size) AIC -
bic.priorN: BIC with specified prior sample size -
spbic: Scaled Unit-Information Prior BIC (SPBIC) -
hbic: Haughton's BIC (HBIC) -
ibic: Information-matrix-based BIC (IBIC) -
sic: Stochastic Information Criterion (SIC) -
hqc: Hannan-Quinn Information Criterion (HQC) -
icomp: Bozdogan Information Complexity (ICOMP) Criteria
Author(s)
Sunthud Pornprasertmanit (psunthud@gmail.com)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
Aaron Boulton (University of Delaware)
Ruben Arslan (Humboldt-University of Berlin, rubenarslan@gmail.com)
Yves Rosseel (Ghent University; Yves.Rosseel@UGent.be)
Mauricio Garnier-Villarreal (Vrije Universiteit Amsterdam; mgv@pm.me)
A great deal of feedback was provided by Kris Preacher regarding Bollen et al.'s (2012, 2014) extensions of BIC.
References
Bollen, K. A., Ray, S., Zavisca, J., & Harden, J. J. (2012). A comparison of Bayes factor approximation methods including two new methods. Sociological Methods & Research, 41(2), 294–324. doi:10.1177/0049124112452393
Bollen, K. A., Harden, J. J., Ray, S., & Zavisca, J. (2014). BIC and alternative Bayesian information criteria in the selection of structural equation models. Structural Equation Modeling, 21(1), 1–19. doi:10.1080/10705511.2014.856691
Burnham, K., & Anderson, D. (2003). Model selection and multimodel inference: A practical–theoretic approach. New York, NY: Springer–Verlag.
Dudgeon, P. (2004). A note on extending Steiger's (1998) multiple sample RMSEA adjustment to other noncentrality parameter-based statistic. Structural Equation Modeling, 11(3), 305–319. doi:10.1207/s15328007sem1103_1
Howe, E. D., Bozdogan, H., & Katragadda, S. (2011). Structural equation modeling (SEM) of categorical and mixed-data using the novel Gifi transformations and information complexity (ICOMP) criterion. Istanbul University Journal of the School of Business Administration, 40(1), 86–123.
Kuha, J. (2004). AIC and BIC: Comparisons of assumptions and performance. Sociological Methods Research, 33(2), 188–229. doi:10.1177/0049124103262065
Preacher, K. J. (2006). Quantifying parsimony in structural equation modeling. Multivariate Behavioral Research, 43(3), 227–259. doi:10.1207/s15327906mbr4103_1
West, S. G., Taylor, A. B., & Wu, W. (2012). Model fit and model selection in structural equation modeling. In R. H. Hoyle (Ed.), Handbook of structural equation modeling (pp. 209–231). New York, NY: Guilford.
See Also
-
epcEquivFit()For the equivalence testing based on expected parameter changes for model fit evaluation -
nullRMSEA()For RMSEA of the default independence model
Examples
HS.model <- ' visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9 '
fit <- cfa(HS.model, data = HolzingerSwineford1939)
moreFitIndices(fit)
fit2 <- cfa(HS.model, data = HolzingerSwineford1939, estimator = "mlr")
moreFitIndices(fit2)
Generate Non-normal Data using Vale and Maurelli (1983) method
Description
Generate Non-normal Data using Vale and Maurelli (1983) method. The function
is designed to be as similar as the popular mvrnorm function in the
MASS package. The codes are copied from mvrnorm function in
the MASS package for argument checking and lavaan package for
data generation using Vale and Maurelli (1983) method.
Usage
mvrnonnorm(n, mu, Sigma, skewness = NULL, kurtosis = NULL,
empirical = FALSE)
Arguments
n |
Sample size |
mu |
A mean vector. If elements are named, those will be used as variable names in the returned data matrix. |
Sigma |
A positive-definite symmetric matrix specifying the covariance
matrix of the variables. If rows or columns are named (and |
skewness |
A vector of skewness of the variables |
kurtosis |
A vector of excessive kurtosis of the variables |
empirical |
deprecated, ignored. |
Value
A data matrix
Author(s)
The original function is the lavaan::simulateData()
function written by Yves Rosseel in the lavaan package. The function
is adjusted for a convenient usage by Sunthud Pornprasertmanit
(psunthud@gmail.com). Terrence D. Jorgensen added the feature to
retain variable names from mu or Sigma.
References
Vale, C. D. & Maurelli, V. A. (1983). Simulating multivariate nonormal distributions. Psychometrika, 48(3), 465–471. doi:10.1007/BF02293687
Examples
set.seed(123)
mvrnonnorm(20, c(1, 2), matrix(c(10, 2, 2, 5), 2, 2),
skewness = c(5, 2), kurtosis = c(3, 3))
## again, with variable names specified in mu
set.seed(123)
mvrnonnorm(20, c(a = 1, b = 2), matrix(c(10, 2, 2, 5), 2, 2),
skewness = c(5, 2), kurtosis = c(3, 3))
Nesting and Equivalence Testing
Description
This test examines whether pairs of SEMs are nested or equivalent.
Usage
net(..., crit = 1e-04)
Arguments
... |
The |
crit |
The upper-bound criterion for testing the equivalence of models.
Models are considered nested (or equivalent) if the difference between
their |
Details
The concept of nesting/equivalence should be the same regardless of
estimation method. However, the particular method of testing
nesting/equivalence (as described in Bentler & Satorra, 2010) employed by
the net function analyzes summary statistics (model-implied means and
covariance matrices, not raw data). In the case of robust methods like MLR,
the raw data is only utilized for the robust adjustment to SE and chi-sq,
and the net function only checks the unadjusted chi-sq for the purposes of
testing nesting/equivalence. This method also applies to models for
categorical data, following the procedure described by Asparouhov & Muthen
(2019).
Value
The Net object representing the outputs for nesting and equivalent testing, including a logical matrix of test results and a vector of degrees of freedom for each model.
Author(s)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
References
Bentler, P. M., & Satorra, A. (2010). Testing model nesting and equivalence. Psychological Methods, 15(2), 111–123. doi:10.1037/a0019625
Asparouhov, T., & Muthen, B. (2019). Nesting and equivalence testing for structural equation models. Structural Equation Modeling, 26(2), 302–309. doi:10.1080/10705511.2018.1513795
Examples
m1 <- ' visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9 '
m2 <- ' f1 =~ x1 + x2 + x3 + x4
f2 =~ x5 + x6 + x7 + x8 + x9 '
m3 <- ' visual =~ x1 + x2 + x3
textual =~ eq*x4 + eq*x5 + eq*x6
speed =~ x7 + x8 + x9 '
fit1 <- cfa(m1, data = HolzingerSwineford1939)
fit1a <- cfa(m1, data = HolzingerSwineford1939, std.lv = TRUE) # Equivalent to fit1
fit2 <- cfa(m2, data = HolzingerSwineford1939) # Not equivalent to or nested in fit1
fit3 <- cfa(m3, data = HolzingerSwineford1939) # Nested in fit1 and fit1a
tests <- net(fit1, fit1a, fit2, fit3)
tests
summary(tests)
Calculate the RMSEA of the null model
Description
Calculate the RMSEA of the null (baseline) model
Usage
nullRMSEA(object, scaled = FALSE, silent = FALSE)
Arguments
object |
The lavaan model object provided after running the |
scaled |
If |
silent |
If |
Details
RMSEA of the null model is calculated similar to the formula provided in the
lavaan package. The standard formula of RMSEA is
RMSEA =\sqrt{\frac{\chi^2}{N \times df} - \frac{1}{N}} \times
\sqrt{G}
where \chi^2 is the chi-square test statistic value of the target
model, N is the total sample size, df is the degree of freedom
of the hypothesized model, G is the number of groups. Kenny proposed
in his website that
"A reasonable rule of thumb is to examine the RMSEA for the null model and make sure that is no smaller than 0.158. An RMSEA for the model of 0.05 and a TLI of .90, implies that the RMSEA of the null model is 0.158. If the RMSEA for the null model is less than 0.158, an incremental measure of fit may not be that informative."
See also the paper cited in References.
Value
A value of RMSEA of the null model (a numeric vector)
returned invisibly.
Author(s)
Ruben Arslan (Humboldt-University of Berlin, rubenarslan@gmail.com)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
References
Kenny, D. A., Kaniskan, B., & McCoach, D. B. (2015). The performance of RMSEA in models with small degrees of freedom. Sociological Methods Research, 44(3), 486–507. doi:10.1177/0049124114543236
See Also
-
miPowerFit()For the modification indices and their power approach for model fit evaluation -
moreFitIndices()For other fit indices
Examples
HS.model <- ' visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9 '
fit <- cfa(HS.model, data = HolzingerSwineford1939)
nullRMSEA(fit)
Random Allocation of Items to Parcels in a Structural Equation Model
Description
This function generates a given number of randomly generated item-to-parcel allocations, fits a model to each allocation, and provides averaged results over all allocations.
Usage
parcelAllocation(model, data, parcel.names, item.syntax, nAlloc = 100,
fun = "sem", alpha = 0.05, fit.measures = c("chisq", "df", "cfi",
"tli", "rmsea", "srmr"), ..., show.progress = FALSE, iseed = 12345,
do.fit = TRUE, return.fit = FALSE, warn = FALSE)
Arguments
model |
|
data |
A |
parcel.names |
|
item.syntax |
|
nAlloc |
The number of random items-to-parcels allocations to generate. |
fun |
|
alpha |
Alpha level used as criterion for significance. |
fit.measures |
|
... |
Additional arguments to be passed to
|
show.progress |
If |
iseed |
(Optional) Random seed used for parceling items. When the same
random seed is specified and the program is re-run, the same allocations
will be generated. Using the same |
do.fit |
If |
return.fit |
If |
warn |
Whether to print warnings when fitting |
Details
This function implements the random item-to-parcel allocation procedure
described in Sterba (2011) and Sterba and MacCallum (2010). The function
takes a single data set with item-level data, randomly assigns items to
parcels, fits a structural equation model to the parceled data using
lavaan::lavaanList(), and repeats this process for a user-specified
number of random allocations. Results from all fitted models are summarized
in the output. For further details on the benefits of randomly allocating
items to parcels, see Sterba (2011) and Sterba and MacCallum (2010).
Value
Estimates |
A |
SE |
A |
Fit |
A |
Model |
A lavaan::lavaanList object containing results
of the |
Author(s)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
References
Sterba, S. K. (2011). Implications of parcel-allocation variability for comparing fit of item-solutions and parcel-solutions. Structural Equation Modeling, 18(4), 554–577. doi:10.1080/10705511.2011.607073
Sterba, S. K. & MacCallum, R. C. (2010). Variability in parameter estimates and model fit across random allocations of items to parcels. Multivariate Behavioral Research, 45(2), 322–358. doi:10.1080/00273171003680302
Sterba, S. K., & Rights, J. D. (2016). Accounting for parcel-allocation variability in practice: Combining sources of uncertainty and choosing the number of allocations. Multivariate Behavioral Research, 51(2–3), 296–313. doi:10.1080/00273171.2016.1144502
Sterba, S. K., & Rights, J. D. (2017). Effects of parceling on model selection: Parcel-allocation variability in model ranking. Psychological Methods, 22(1), 47–68. doi:10.1037/met0000067
See Also
PAVranking() for comparing 2 models,
poolMAlloc() for choosing the number of allocations
Examples
## Fit 2-factor CFA to simulated data. Each factor has 9 indicators.
## Specify the item-level model (if NO parcels were created)
item.syntax <- c(paste0("f1 =~ f1item", 1:9),
paste0("f2 =~ f2item", 1:9))
cat(item.syntax, sep = "\n")
## Below, we reduce the size of this same model by
## applying different parceling schemes
## 3-indicator parcels
mod.parcels <- '
f1 =~ par1 + par2 + par3
f2 =~ par4 + par5 + par6
'
## names of parcels
(parcel.names <- paste0("par", 1:6))
## override default random-number generator to use parallel options
RNGkind("L'Ecuyer-CMRG")
parcelAllocation(mod.parcels, data = simParcel, nAlloc = 100,
parcel.names = parcel.names, item.syntax = item.syntax,
# parallel = "multicore", # parallel available in Mac/Linux
std.lv = TRUE) # any addition lavaan arguments
## POOL RESULTS by treating parcel allocations as multiple imputations
## Details provided in Sterba & Rights (2016); see ?poolMAlloc.
## save list of data sets instead of fitting model yet
dataList <- parcelAllocation(mod.parcels, data = simParcel, nAlloc = 100,
parcel.names = parcel.names,
item.syntax = item.syntax,
do.fit = FALSE)
## now fit the model to each data set
library(lavaan.mi)
fit.parcels <- cfa.mi(mod.parcels, data = dataList, std.lv = TRUE)
summary(fit.parcels) # pooled using Rubin's rules
anova(fit.parcels) # pooled test statistic
help(package = "lavaan.mi") # find more methods for pooling results
## multigroup example
simParcel$group <- 0:1 # arbitrary groups for example
mod.mg <- '
f1 =~ par1 + c(L2, L2)*par2 + par3
f2 =~ par4 + par5 + par6
'
## names of parcels
(parcel.names <- paste0("par", 1:6))
parcelAllocation(mod.mg, data = simParcel, parcel.names, item.syntax,
std.lv = TRUE, group = "group", group.equal = "loadings",
nAlloc = 20, show.progress = TRUE)
## parcels for first factor, items for second factor
mod.items <- '
f1 =~ par1 + par2 + par3
f2 =~ f2item2 + f2item7 + f2item8
'
## names of parcels
(parcel.names <- paste0("par", 1:3))
parcelAllocation(mod.items, data = simParcel, parcel.names, item.syntax,
nAlloc = 20, std.lv = TRUE)
## mixture of 1- and 3-indicator parcels for second factor
mod.mix <- '
f1 =~ par1 + par2 + par3
f2 =~ f2item2 + f2item7 + f2item8 + par4 + par5 + par6
'
## names of parcels
(parcel.names <- paste0("par", 1:6))
parcelAllocation(mod.mix, data = simParcel, parcel.names, item.syntax,
nAlloc = 20, std.lv = TRUE)
Partial Measurement Invariance Testing Across Groups
Description
This test will provide partial invariance testing by (a) freeing a parameter
one-by-one from nested model and compare with the original nested model or
(b) fixing (or constraining) a parameter one-by-one from the parent model
and compare with the original parent model. This function only works with
congeneric models. The partialInvariance is used for continuous
variable. The partialInvarianceCat is used for categorical variables.
Usage
partialInvariance(fit, type, free = NULL, fix = NULL, refgroup = 1,
poolvar = TRUE, p.adjust = "none", fbound = 2, return.fit = FALSE,
method = "satorra.bentler.2001")
partialInvarianceCat(fit, type, free = NULL, fix = NULL, refgroup = 1,
poolvar = TRUE, p.adjust = "none", return.fit = FALSE,
method = "satorra.bentler.2001")
Arguments
fit |
A list of models for invariance testing. Each model should be assigned by appropriate names (see details). |
type |
The types of invariance testing: "metric", "scalar", "strict", or "means" |
free |
A vector of variable names that are free across groups in advance. If partial mean invariance is tested, this argument represents a vector of factor names that are free across groups. |
fix |
A vector of variable names that are constrained to be equal across groups in advance. If partial mean invariance is tested, this argument represents a vector of factor names that are fixed across groups. |
refgroup |
The reference group used to make the effect size comparison with the other groups. |
poolvar |
If |
p.adjust |
The method used to adjust p values. See
|
fbound |
The z-scores of factor that is used to calculate the effect size of the loading difference proposed by Millsap and Olivera-Aguilar (2012). |
return.fit |
Return the submodels fitted by this function |
method |
The method used to calculate likelihood ratio test. See
|
Details
There are four types of partial invariance testing:
Partial weak invariance. The model named
fit.configuralfrom the list of models is compared with the model namedfit.loadings. Each loading will be freed or fixed from the metric and configural invariance models respectively. The modified models are compared with the original model. Note that the objects in the list of models must have the names of"fit.configural"and"fit.loadings". Users may use "metric", "weak", "loading", or "loadings" in thetypeargument. Note that, for testing invariance on marker variables, other variables will be assigned as marker variables automatically.Partial strong invariance. The model named
fit.loadingsfrom the list of models is compared with the model named eitherfit.interceptsor 'fit.thresholds'. Each intercept will be freed or fixed from the scalar and metric invariance models respectively. The modified models are compared with the original model. Note that the objects in the list of models must have the names of "fit.loadings" and either "fit.intercepts" or "fit.thresholds". Users may use "scalar", "strong", "intercept", "intercepts", "threshold", or "thresholds" in thetypeargument. Note that, for testing invariance on marker variables, other variables will be assigned as marker variables automatically. Note that if all variables are dichotomous, scalar invariance testing is not available.Partial strict invariance. The model named either 'fit.intercepts' or 'fit.thresholds' (or 'fit.loadings') from the list of models is compared with the model named 'fit.residuals'. Each residual variance will be freed or fixed from the strict and scalar (or metric) invariance models respectively. The modified models are compared with the original model. Note that the objects in the list of models must have the names of "fit.residuals" and either "fit.intercepts", "fit.thresholds", or "fit.loadings". Users may use "strict", "residual", "residuals", "error", or "errors" in the
typeargument.Partial mean invariance. The model named either 'fit.intercepts' or 'fit.thresholds' (or 'fit.residuals' or 'fit.loadings') from the list of models is compared with the model named 'fit.means'. Each factor mean will be freed or fixed from the means and scalar (or strict or metric) invariance models respectively. The modified models are compared with the original model. Note that the objects in the list of models must have the names of "fit.means" and either "fit.residuals", "fit.intercepts", "fit.thresholds", or "fit.loadings". Users may use "means" or "mean" in the
typeargument.
Two types of comparisons are used in this function:
-
free: The nested model is used as a template. Then, one parameter indicating the differences between two models is free. The new model is compared with the nested model. This process is repeated for all differences between two models. The likelihood-ratio test and the difference in CFI are provided. -
fix: The parent model is used as a template. Then, one parameter indicating the differences between two models is fixed or constrained to be equal to other parameters. The new model is then compared with the parent model. This process is repeated for all differences between two models. The likelihood-ratio test and the difference in CFI are provided. -
wald: This method is similar to thefixmethod. However, instead of building a new model and compare them with likelihood-ratio test, multivariate wald test is used to compare equality between parameter estimates. Seelavaan::lavTestWald()for further details. Note that if any rows of the contrast cannot be summed to 0, the Wald test is not provided, such as comparing two means where one of the means is fixed as 0. This test statistic is not as accurate as likelihood-ratio test provided infix. I provide it here in case that likelihood-ratio test fails to converge.
Note that this function does not adjust for the inflated Type I error rate from multiple tests. The degree of freedom of all tests would be the number of groups minus 1.
The details of standardized estimates and the effect size used for each
parameters are provided in the vignettes by running
vignette("partialInvariance").
Value
A list of results are provided. The list will consists of at least two elements:
-
estimates: The results of parameter estimates including pooled estimates (poolest), the estimates for each group, standardized estimates for each group (std), the difference in standardized values, and the effect size statistic (q for factor loading difference and h for error variance difference). See the details of this effect size statistic by runningvignette("partialInvariance"). In thepartialInvariancefunction, the additional effect statistics proposed by Millsap and Olivera-Aguilar (2012) are provided. For factor loading, the additional outputs are the observed mean difference (diff_mean), the mean difference if factor scores are low (low_fscore), and the mean difference if factor scores are high (high_fscore). The low factor score is calculated by (a) finding the factor scores that its z score equals -bound(the default is-2) from all groups and (b) picking the minimum value among the factor scores. The high factor score is calculated by (a) finding the factor scores that its z score equalsbound(default = 2) from all groups and (b) picking the maximum value among the factor scores. For measurement intercepts, the additional outputs are the observed means difference (diff_mean) and the proportion of the differences in the intercepts over the observed means differences (propdiff). For error variances, the additional outputs are the proportion of the difference in error variances over the difference in observed variances (propdiff). -
results: Statistical tests as well as the change in CFI are provided.\chi^2and p value are provided for all methods. -
models: The submodels used in thefreeandfixmethods, as well as the nested and parent models. The nested and parent models will be changed from the original models iffreeorfitarguments are specified.
Author(s)
Sunthud Pornprasertmanit (psunthud@gmail.com)
References
Millsap, R. E., & Olivera-Aguilar, M. (2012). Investigating measurement invariance using confirmatory factor analysis. In R. H. Hoyle (Ed.), Handbook of structural equation modeling (pp. 380–392). New York, NY: Guilford.
Examples
## Conduct weak invariance testing manually by using fixed-factor
## method of scale identification
library(lavaan)
conf <- "
f1 =~ NA*x1 + x2 + x3
f2 =~ NA*x4 + x5 + x6
f1 ~~ c(1, 1)*f1
f2 ~~ c(1, 1)*f2
"
weak <- "
f1 =~ NA*x1 + x2 + x3
f2 =~ NA*x4 + x5 + x6
f1 ~~ c(1, NA)*f1
f2 ~~ c(1, NA)*f2
"
configural <- cfa(conf, data = HolzingerSwineford1939, std.lv = TRUE, group="school")
weak <- cfa(weak, data = HolzingerSwineford1939, group="school", group.equal="loadings")
models <- list(fit.configural = configural, fit.loadings = weak)
partialInvariance(models, "metric")
Permutation Randomization Tests of Measurement Equivalence and Differential Item Functioning (DIF)
Description
The function permuteMeasEq provides tests of hypotheses involving
measurement equivalence, in one of two frameworks: multigroup CFA or MIMIC
models.
Usage
permuteMeasEq(nPermute, modelType = c("mgcfa", "mimic"), con, uncon = NULL,
null = NULL, param = NULL, freeParam = NULL, covariates = NULL,
AFIs = NULL, moreAFIs = NULL, maxSparse = 10, maxNonconv = 10,
showProgress = TRUE, warn = -1, datafun, extra,
parallelType = c("none", "multicore", "snow"), ncpus = NULL, cl = NULL,
iseed = 12345)
Arguments
nPermute |
An integer indicating the number of random permutations used to form empirical distributions under the null hypothesis. |
modelType |
A character string indicating type of model employed:
multiple-group CFA ( |
con |
The constrained |
uncon |
Optional. The unconstrained |
null |
Optional. A |
param |
An optional character vector or list of character vectors
indicating which parameters the user would test for DIF following a
rejection of the omnibus null hypothesis tested using
( |
freeParam |
An optional character vector, silently ignored when
|
covariates |
An optional character vector, only applicable when
|
AFIs |
A character vector indicating which alternative fit indices (or
chi-squared itself) are to be used to test the multiparameter omnibus null
hypothesis that the constraints specified in |
moreAFIs |
Optional. A character vector indicating which (if any)
alternative fit indices returned by |
maxSparse |
Only applicable when |
maxNonconv |
An integer indicating the maximum number of consecutive
times that a random permutation can yield a sample for which the model does
not converge on a solution. If such a sample occurs, permutation is
attempted repeatedly until a sample is obtained for which the model does
converge. If |
showProgress |
Logical. Indicating whether to display a progress bar
while permuting. Silently set to |
warn |
Sets the handling of warning messages when fitting model(s) to
permuted data sets. See |
datafun |
An optional function that can be applied to the data
(extracted from |
extra |
An optional function that can be applied to any (or all) of the
fitted lavaan objects ( |
parallelType |
The type of parallel operation to be used (if any). The
default is |
ncpus |
Integer: number of processes to be used in parallel operation.
If |
cl |
An optional parallel or snow cluster for use when
|
iseed |
Integer: Only used to set the states of the RNG when using
parallel options, in which case |
Details
The function permuteMeasEq provides tests of hypotheses involving
measurement equivalence, in one of two frameworks:
1 For multiple-group CFA models, provide a pair of nested lavaan objects, the less constrained of which (
uncon) freely estimates a set of measurement parameters (e.g., factor loadings, intercepts, or thresholds; specified inparam) in all groups, and the more constrained of which (con) constrains those measurement parameters to equality across groups. Group assignment is repeatedly permuted and the models are fit to each permutation, in order to produce an empirical distribution under the null hypothesis of no group differences, both for (a) changes in user-specified fit measures (seeAFIsandmoreAFIs) and for (b) the maximum modification index among the user-specified equality constraints. Configural invariance can also be tested by providing that fitted lavaan object toconand leavinguncon = NULL, in which caseparammust beNULLas well.2 In MIMIC models, one or a set of continuous and/or discrete
covariatescan be permuted, and a constrained model is fit to each permutation in order to provide a distribution of any fit measures (namely, the maximum modification index among fixed parameters inparam) under the null hypothesis of measurement equivalence across levels of those covariates.
In either framework, modification indices for equality constraints or fixed
parameters specified in param are calculated from the constrained
model (con) using the function lavaan::lavTestScore().
For multiple-group CFA models, the multiparameter omnibus null hypothesis of
measurement equivalence/invariance is that there are no group differences in
any measurement parameters (of a particular type). This can be tested using
the anova method on nested lavaan objects, or by inspecting
the change in alternative fit indices (AFIs) such as the CFI. The
permutation randomization method employed by permuteMeasEq generates
an empirical distribution of any AFIs under the null hypothesis, so
the user is not restricted to using fixed cutoffs proposed by Cheung &
Rensvold (2002), Chen (2007), or Meade, Johnson, & Braddy (2008).
If the multiparameter omnibus null hypothesis is rejected, partial
invariance can still be established by freeing invalid equality constraints,
as long as equality constraints are valid for at least two indicators per
factor. Modification indices can be calculated from the constrained model
(con), but multiple testing leads to inflation of Type I error rates.
The permutation randomization method employed by permuteMeasEq
creates a distribution of the maximum modification index if the null
hypothesis is true, which allows the user to control the familywise Type I
error rate in a manner similar to Tukey's q (studentized range)
distribution for the Honestly Significant Difference (HSD) post hoc test.
For MIMIC models, DIF can be tested by comparing modification indices of
regression paths to the permutation distribution of the maximum modification
index, which controls the familywise Type I error rate. The MIMIC approach
could also be applied with multiple-group models, but the grouping variable
would not be permuted; rather, the covariates would be permuted separately
within each group to preserve between-group differences. So whether
parameters are constrained or unconstrained across groups, the MIMIC
approach is only for testing null hypotheses about the effects of
covariates on indicators, controlling for common factors.
In either framework, lavaan::lavaan()'s group.label
argument is used to preserve the order of groups seen in con when
permuting the data.
Value
The permuteMeasEq object representing the results of
testing measurement equivalence (the multiparameter omnibus test) and DIF
(modification indices), as well as diagnostics and any extra output.
Author(s)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
References
Papers about permutation tests of measurement equivalence:
Jorgensen, T. D., Kite, B. A., Chen, P.-Y., & Short, S. D. (2018). Permutation randomization methods for testing measurement equivalence and detecting differential item functioning in multiple-group confirmatory factor analysis. Psychological Methods, 23(4), 708–728. doi:10.1037/met0000152
Kite, B. A., Jorgensen, T. D., & Chen, P.-Y. (2018). Random permutation testing applied to measurement invariance testing with ordered-categorical indicators. Structural Equation Modeling 25(4), 573–587. doi:10.1080/10705511.2017.1421467
Jorgensen, T. D. (2017). Applying permutation tests and multivariate modification indices to configurally invariant models that need respecification. Frontiers in Psychology, 8(1455). doi:10.3389/fpsyg.2017.01455
Additional reading:
Chen, F. F. (2007). Sensitivity of goodness of fit indexes to lack of measurement invariance. Structural Equation Modeling, 14(3), 464–504. doi:10.1080/10705510701301834
Cheung, G. W., & Rensvold, R. B. (2002). Evaluating goodness-of-fit indexes for testing measurement invariance. Structural Equation Modeling, 9(2), 233–255. doi:10.1207/S15328007SEM0902_5
Meade, A. W., Johnson, E. C., & Braddy, P. W. (2008). Power and sensitivity of alternative fit indices in tests of measurement invariance. Journal of Applied Psychology, 93(3), 568–592. doi:10.1037/0021-9010.93.3.568
Widamin, K. F., & Thompson, J. S. (2003). On specifying the null model for incremental fit indices in structural equation modeling. Psychological Methods, 8(1), 16–37. doi:10.1037/1082-989X.8.1.16
See Also
stats::TukeyHSD(), lavaan::lavTestScore()
Examples
########################
## Multiple-Group CFA ##
########################
## create 3-group data in lavaan example(cfa) data
HS <- lavaan::HolzingerSwineford1939
HS$ageGroup <- ifelse(HS$ageyr < 13, "preteen",
ifelse(HS$ageyr > 13, "teen", "thirteen"))
## specify and fit an appropriate null model for incremental fit indices
mod.null <- c(paste0("x", 1:9, " ~ c(T", 1:9, ", T", 1:9, ", T", 1:9, ")*1"),
paste0("x", 1:9, " ~~ c(L", 1:9, ", L", 1:9, ", L", 1:9, ")*x", 1:9))
fit.null <- cfa(mod.null, data = HS, group = "ageGroup")
## fit target model with varying levels of measurement equivalence
mod.config <- '
visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9
'
fit.config <- cfa(mod.config, data = HS, std.lv = TRUE, group = "ageGroup")
fit.metric <- cfa(mod.config, data = HS, std.lv = TRUE, group = "ageGroup",
group.equal = "loadings")
fit.scalar <- cfa(mod.config, data = HS, std.lv = TRUE, group = "ageGroup",
group.equal = c("loadings","intercepts"))
####################### Permutation Method
## fit indices of interest for multiparameter omnibus test
myAFIs <- c("chisq","cfi","rmsea","mfi","aic")
moreAFIs <- c("gammaHat","adjGammaHat")
## Use only 20 permutations for a demo. In practice,
## use > 1000 to reduce sampling variability of estimated p values
## test configural invariance
set.seed(12345)
out.config <- permuteMeasEq(nPermute = 20, con = fit.config)
out.config
## test metric equivalence
set.seed(12345) # same permutations
out.metric <- permuteMeasEq(nPermute = 20, uncon = fit.config, con = fit.metric,
param = "loadings", AFIs = myAFIs,
moreAFIs = moreAFIs, null = fit.null)
summary(out.metric, nd = 4)
## test scalar equivalence
set.seed(12345) # same permutations
out.scalar <- permuteMeasEq(nPermute = 20, uncon = fit.metric, con = fit.scalar,
param = "intercepts", AFIs = myAFIs,
moreAFIs = moreAFIs, null = fit.null)
summary(out.scalar)
## Not much to see without significant DIF.
## Try using an absurdly high alpha level for illustration.
outsum <- summary(out.scalar, alpha = .50)
## notice that the returned object is the table of DIF tests
outsum
## visualize permutation distribution
hist(out.config, AFI = "chisq")
hist(out.metric, AFI = "chisq", nd = 2, alpha = .01,
legendArgs = list(x = "topright"))
hist(out.scalar, AFI = "cfi", printLegend = FALSE)
####################### Extra Output
## function to calculate expected change of Group-2 and -3 latent means if
## each intercept constraint were released
extra <- function(con) {
output <- list()
output["x1.vis2"] <- lavTestScore(con, release = 19:20, univariate = FALSE,
epc = TRUE, warn = FALSE)$epc$epc[70]
output["x1.vis3"] <- lavTestScore(con, release = 19:20, univariate = FALSE,
epc = TRUE, warn = FALSE)$epc$epc[106]
output["x2.vis2"] <- lavTestScore(con, release = 21:22, univariate = FALSE,
epc = TRUE, warn = FALSE)$epc$epc[70]
output["x2.vis3"] <- lavTestScore(con, release = 21:22, univariate = FALSE,
epc = TRUE, warn = FALSE)$epc$epc[106]
output["x3.vis2"] <- lavTestScore(con, release = 23:24, univariate = FALSE,
epc = TRUE, warn = FALSE)$epc$epc[70]
output["x3.vis3"] <- lavTestScore(con, release = 23:24, univariate = FALSE,
epc = TRUE, warn = FALSE)$epc$epc[106]
output["x4.txt2"] <- lavTestScore(con, release = 25:26, univariate = FALSE,
epc = TRUE, warn = FALSE)$epc$epc[71]
output["x4.txt3"] <- lavTestScore(con, release = 25:26, univariate = FALSE,
epc = TRUE, warn = FALSE)$epc$epc[107]
output["x5.txt2"] <- lavTestScore(con, release = 27:28, univariate = FALSE,
epc = TRUE, warn = FALSE)$epc$epc[71]
output["x5.txt3"] <- lavTestScore(con, release = 27:28, univariate = FALSE,
epc = TRUE, warn = FALSE)$epc$epc[107]
output["x6.txt2"] <- lavTestScore(con, release = 29:30, univariate = FALSE,
epc = TRUE, warn = FALSE)$epc$epc[71]
output["x6.txt3"] <- lavTestScore(con, release = 29:30, univariate = FALSE,
epc = TRUE, warn = FALSE)$epc$epc[107]
output["x7.spd2"] <- lavTestScore(con, release = 31:32, univariate = FALSE,
epc = TRUE, warn = FALSE)$epc$epc[72]
output["x7.spd3"] <- lavTestScore(con, release = 31:32, univariate = FALSE,
epc = TRUE, warn = FALSE)$epc$epc[108]
output["x8.spd2"] <- lavTestScore(con, release = 33:34, univariate = FALSE,
epc = TRUE, warn = FALSE)$epc$epc[72]
output["x8.spd3"] <- lavTestScore(con, release = 33:34, univariate = FALSE,
epc = TRUE, warn = FALSE)$epc$epc[108]
output["x9.spd2"] <- lavTestScore(con, release = 35:36, univariate = FALSE,
epc = TRUE, warn = FALSE)$epc$epc[72]
output["x9.spd3"] <- lavTestScore(con, release = 35:36, univariate = FALSE,
epc = TRUE, warn = FALSE)$epc$epc[108]
output
}
## observed EPC
extra(fit.scalar)
## permutation results, including extra output
set.seed(12345) # same permutations
out.scalar <- permuteMeasEq(nPermute = 20, uncon = fit.metric, con = fit.scalar,
param = "intercepts", AFIs = myAFIs,
moreAFIs = moreAFIs, null = fit.null, extra = extra)
## summarize extra output
summary(out.scalar, extra = TRUE)
###########
## MIMIC ##
###########
## Specify Restricted Factor Analysis (RFA) model, equivalent to MIMIC, but
## the factor covaries with the covariate instead of being regressed on it.
## The covariate defines a single-indicator construct, and the
## double-mean-centered products of the indicators define a latent
## interaction between the factor and the covariate.
mod.mimic <- '
visual =~ x1 + x2 + x3
age =~ ageyr
age.by.vis =~ x1.ageyr + x2.ageyr + x3.ageyr
x1 ~~ x1.ageyr
x2 ~~ x2.ageyr
x3 ~~ x3.ageyr
'
HS.orth <- indProd(var1 = paste0("x", 1:3), var2 = "ageyr", match = FALSE,
data = HS[ , c("ageyr", paste0("x", 1:3))] )
fit.mimic <- cfa(mod.mimic, data = HS.orth, meanstructure = TRUE)
summary(fit.mimic, stand = TRUE)
## Whereas MIMIC models specify direct effects of the covariate on an indicator,
## DIF can be tested in RFA models by specifying free loadings of an indicator
## on the covariate's construct (uniform DIF, scalar invariance) and the
## interaction construct (nonuniform DIF, metric invariance).
param <- as.list(paste0("age + age.by.vis =~ x", 1:3))
names(param) <- paste0("x", 1:3)
# param <- as.list(paste0("x", 1:3, " ~ age + age.by.vis")) # equivalent
## test both parameters simultaneously for each indicator
do.call(rbind, lapply(param, function(x) lavTestScore(fit.mimic, add = x)$test))
## or test each parameter individually
lavTestScore(fit.mimic, add = as.character(param))
####################### Permutation Method
## function to recalculate interaction terms after permuting the covariate
datafun <- function(data) {
d <- data[, c(paste0("x", 1:3), "ageyr")]
indProd(var1 = paste0("x", 1:3), var2 = "ageyr", match = FALSE, data = d)
}
set.seed(12345)
perm.mimic <- permuteMeasEq(nPermute = 20, modelType = "mimic",
con = fit.mimic, param = param,
covariates = "ageyr", datafun = datafun)
summary(perm.mimic)
Class for the Results of Permutation Randomization Tests of Measurement Equivalence and DIF
Description
This class contains the results of tests of Measurement Equivalence and Differential Item Functioning (DIF).
Usage
## S4 method for signature 'permuteMeasEq'
show(object)
## S4 method for signature 'permuteMeasEq'
summary(object, alpha = 0.05, nd = 3,
extra = FALSE)
## S4 method for signature 'permuteMeasEq'
hist(x, ..., AFI, alpha = 0.05, nd = 3,
printLegend = TRUE, legendArgs = list(x = "topleft"))
Arguments
object, x |
object of class |
alpha |
alpha level used to draw confidence limits in |
nd |
number of digits to display |
extra |
|
... |
Additional arguments to pass to |
AFI |
|
printLegend |
|
legendArgs |
|
Value
The
showmethod prints a summary of the multiparameter omnibus test results, using the user-specified AFIs. The parametric (\Delta)\chi^2test is also displayed.The
summarymethod prints the same information from theshowmethod, but whenextra = FALSE(the default) it also provides a table summarizing any requested follow-up tests of DIF using modification indices in slotMI.obs. The user can also specify analphalevel for flagging modification indices as significant, as well asnd(the number of digits displayed). For each modification index, the p value is displayed using a central\chi^2distribution with the df shown in that column. Additionally, a p value is displayed using the permutation distribution of the maximum index, which controls the familywise Type I error rate in a manner similar to Tukey's studentized range test. If any indices are flagged as significant using thetukey.p.value, then a message is displayed for each flagged index. The invisibly returneddata.frameis the displayed table of modification indices, unlesspermuteMeasEq()was called withparam = NULL, in which case the invisibly returned object isobject. Ifextra = TRUE, the permutation-based p values for each statistic returned by theextrafunction are displayed and returned in adata.frameinstead of the modification indices requested in theparamargument.The
histmethod returns a list oflength == 2, containing the arguments for the call tohistand the arguments to the call forlegend, respectively. This list may facilitate creating a customized histogram ofAFI.dist,MI.dist, orextra.dist
Slots
PTA
data.framereturned by a call tolavaan::parTable()on the constrained modelmodelTypeA character indicating the specified
modelTypein the call topermuteMeasEqANOVAA
numericvector indicating the results of the observed (\Delta)\chi^2test, based on the central\chi^2distributionAFI.obsA vector of observed (changes in) user-selected fit measures
AFI.distThe permutation distribution(s) of user-selected fit measures. A
data.framewithn.Permutationsrows and one column for eachAFI.obs.AFI.pvalA vector of p values (one for each element in slot
AFI.obs) calculated using slotAFI.dist, indicating the probability of observing a change at least as extreme asAFI.obsif the null hypothesis were trueMI.obsA
data.frameof observed Lagrange Multipliers (modification indices) associated with the equality constraints or fixed parameters specified in theparamargument. This is a subset of the output returned by a call tolavaan::lavTestScore()on the constrained model.MI.distThe permutation distribution of the maximum modification index (among those seen in slot
MI.obs$X2) at each permutation of group assignment or ofcovariatesextra.obsIf
permuteMeasEqwas called with anextrafunction, the output when applied to the original data is concatenated into this vectorextra.distA
data.frame, each column of which contains the permutation distribution of the corresponding statistic in slotextra.obsn.PermutationsAn
integerindicating the number of permutations requested by the usern.ConvergedAn
integerindicating the number of permuation iterations which yielded a converged solutionn.nonConvergedAn
integervector of lengthn.Permutationsindicating how many times group assignment was randomly permuted (at each iteration) before converging on a solutionn.SparseOnly relevant with
orderedindicators whenmodelType == "mgcfa". Anintegervector of lengthn.Permutationsindicating how many times group assignment was randomly permuted (at each iteration) before obtaining a sample with all categories observed in all groups.oldSeedAn
integervector storing the value of.Random.seedbefore runningpermuteMeasEq. Only relevant when using a parallel/multicore option and the originalRNGkind() != "L'Ecuyer-CMRG". This enables users to restore their previous.Random.seedstate, if desired, by running:.Random.seed[-1] <- permutedResults@oldSeed[-1]
Objects from the Class
Objects can be created via the
permuteMeasEq() function.
Author(s)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
See Also
Examples
# See the example from the permuteMeasEq function
Plausible-Values Imputation of Factor Scores Estimated from a lavaan Model
Description
Draw plausible values of factor scores estimated from a fitted
lavaan::lavaan() model, then treat them as multiple imputations
of missing data using lavaan.mi::lavaan.mi().
Usage
plausibleValues(object, nDraws = 20L, seed = 12345,
omit.imps = c("no.conv", "no.se"), ...)
Arguments
object |
A fitted model of class lavaan::lavaan, blavaan::blavaan, or lavaan.mi::lavaan.mi |
nDraws |
|
seed |
|
omit.imps |
|
... |
Optional arguments to pass to |
Details
Because latent variables are unobserved, they can be considered as missing
data, which can be imputed using Monte Carlo methods. This may be of
interest to researchers with sample sizes too small to fit their complex
structural models. Fitting a factor model as a first step,
lavaan::lavPredict() provides factor-score estimates, which can
be treated as observed values in a path analysis (Step 2). However, the
resulting standard errors and test statistics could not be trusted because
the Step-2 analysis would not take into account the uncertainty about the
estimated factor scores. Using the asymptotic sampling covariance matrix
of the factor scores provided by lavaan::lavPredict(),
plausibleValues draws a set of nDraws imputations from the
sampling distribution of each factor score, returning a list of data sets
that can be treated like multiple imputations of incomplete data. If the
data were already imputed to handle missing data, plausibleValues
also accepts an object of class lavaan.mi::lavaan.mi, and will
draw nDraws plausible values from each imputation. Step 2 would
then take into account uncertainty about both missing values and factor
scores. Bayesian methods can also be used to generate factor scores, as
available with the blavaan package, in which case plausible
values are simply saved parameters from the posterior distribution. See
Asparouhov and Muthen (2010) for further technical details and references.
Each returned data.frame includes a case.idx column that
indicates the corresponding rows in the data set to which the model was
originally fitted (unless the user requests only Level-2 variables). This
can be used to merge the plausible values with the original observed data,
but users should note that including any new variables in a Step-2 model
might not accurately account for their relationship(s) with factor scores
because they were not accounted for in the Step-1 model from which factor
scores were estimated.
If object is a multilevel lavaan model, users can request
plausible values for latent variables at particular levels of analysis by
setting the lavaan::lavPredict() argument level=1 or
level=2. If the level argument is not passed via ...,
then both levels are returned in a single merged data set per draw. For
multilevel models, each returned data.frame also includes a column
indicating to which cluster each row belongs (unless the user requests only
Level-2 variables).
Value
A list of length nDraws, each of which is a
data.frame containing plausible values, which can be treated as
a list of imputed data sets to be passed to the lavaan.mi package
(see Examples). If object is of class
lavaan.mi::lavaan.mi, the list will be of length
nDraws*m, where m is the number of imputations.
Author(s)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
References
Asparouhov, T. & Muthen, B. O. (2010). Plausible values for latent variables using Mplus. Technical Report. Retrieved from www.statmodel.com/download/Plausible.pdf
See Also
lavaan.mi::lavaan.mi(), lavaan.mi::lavaan.mi
Examples
## example from ?cfa and ?lavPredict help pages
HS.model <- ' visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9 '
fit1 <- cfa(HS.model, data = HolzingerSwineford1939)
fs1 <- plausibleValues(fit1, nDraws = 3,
## lavPredict() can add only the modeled data
append.data = TRUE)
lapply(fs1, head)
## To merge factor scores to original data.frame (not just modeled data)
fs1 <- plausibleValues(fit1, nDraws = 3)
idx <- lavInspect(fit1, "case.idx") # row index for each case
if (is.list(idx)) idx <- do.call(c, idx) # for multigroup models
data(HolzingerSwineford1939) # copy data to workspace
HolzingerSwineford1939$case.idx <- idx # add row index as variable
## loop over draws to merge original data with factor scores
for (i in seq_along(fs1)) {
fs1[[i]] <- merge(fs1[[i]], HolzingerSwineford1939, by = "case.idx")
}
lapply(fs1, head)
## multiple-group analysis, in 2 steps
step1 <- cfa(HS.model, data = HolzingerSwineford1939, group = "school",
group.equal = c("loadings","intercepts"))
PV.list <- plausibleValues(step1)
## subsequent path analysis
path.model <- ' visual ~ c(t1, t2)*textual + c(s1, s2)*speed '
if(requireNamespace("lavaan.mi")){
library(lavaan.mi)
step2 <- sem.mi(path.model, data = PV.list, group = "school")
## test equivalence of both slopes across groups
lavTestWald.mi(step2, constraints = 't1 == t2 ; s1 == s2')
}
## multilevel example from ?Demo.twolevel help page
model <- '
level: 1
fw =~ y1 + y2 + y3
fw ~ x1 + x2 + x3
level: 2
fb =~ y1 + y2 + y3
fb ~ w1 + w2
'
msem <- sem(model, data = Demo.twolevel, cluster = "cluster")
mlPVs <- plausibleValues(msem, nDraws = 3) # both levels by default
lapply(mlPVs, head, n = 10)
## only Level 1
mlPV1 <- plausibleValues(msem, nDraws = 3, level = 1)
lapply(mlPV1, head)
## only Level 2
mlPV2 <- plausibleValues(msem, nDraws = 3, level = 2)
lapply(mlPV2, head)
## example with 20 multiple imputations of missing data:
nPVs <- 5
nImps <- 20
if (requireNamespace("lavaan.mi")) {
data(HS20imps, package = "lavaan.mi")
## specify CFA model from lavaan's ?cfa help page
HS.model <- '
visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9
'
out2 <- cfa.mi(HS.model, data = HS20imps)
PVs <- plausibleValues(out2, nDraws = nPVs)
idx <- out2@Data@case.idx # can't use lavInspect() on lavaan.mi
## empty list to hold expanded imputations
impPVs <- list()
for (m in 1:nImps) {
HS20imps[[m]]["case.idx"] <- idx
for (i in 1:nPVs) {
impPVs[[ nPVs*(m - 1) + i ]] <- merge(HS20imps[[m]],
PVs[[ nPVs*(m - 1) + i ]],
by = "case.idx")
}
}
lapply(impPVs, head)
}
Plot a latent interaction
Description
This function will plot the line graphs representing the simple effect of the independent variable given the values of the moderator. For multigroup models, it will only generate a plot for 1 group, as specified in the function used to obtain the first argument.
Usage
plotProbe(object, xlim, xlab = "Indepedent Variable",
ylab = "Dependent Variable", legend = TRUE, legendArgs = list(), ...)
Arguments
object |
A |
xlim |
The vector of two numbers: the minimum and maximum values of the independent variable |
xlab |
The label of the x-axis |
ylab |
The label of the y-axis |
legend |
|
legendArgs |
|
... |
Any additional argument for the |
Value
None. This function will plot the simple main effect only.
Note
If the object does not contain simple intercepts (i.e., if the
object$SimpleIntcept element is NULL), then all simple
intercepts are arbitrarily set to zero in order to plot the simple slopes.
This may not be consistent with the fitted model, but was (up until version
0.5-7) the default behavior when the y-intercept was fixed to 0. In this case,
although the relative steepness of simple slopes can still meaningfully be
compared, the relative vertical positions of lines at any point along the
x-axis should not be interpreted.
Author(s)
Sunthud Pornprasertmanit (psunthud@gmail.com)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
References
Schoemann, A. M., & Jorgensen, T. D. (2021). Testing and interpreting
latent variable interactions using the semTools package.
Psych, 3(3), 322–335. doi:10.3390/psych3030024
See Also
-
indProd()For creating the indicator products with no centering, mean centering, double-mean centering, or residual centering. -
probe2WayMC()For probing the two-way latent interaction when the results are obtained from mean-centering, or double-mean centering -
probe3WayMC()For probing the three-way latent interaction when the results are obtained from mean-centering, or double-mean centering -
probe2WayRC()For probing the two-way latent interaction when the results are obtained from residual-centering approach. -
probe3WayRC()For probing the two-way latent interaction when the results are obtained from residual-centering approach.
Examples
library(lavaan)
dat2wayMC <- indProd(dat2way, 1:3, 4:6)
model1 <- "
f1 =~ x1 + x2 + x3
f2 =~ x4 + x5 + x6
f12 =~ x1.x4 + x2.x5 + x3.x6
f3 =~ x7 + x8 + x9
f3 ~ f1 + f2 + f12
f12 ~~ 0*f1
f12 ~~ 0*f2
x1 ~ 0*1
x4 ~ 0*1
x1.x4 ~ 0*1
x7 ~ 0*1
f1 ~ NA*1
f2 ~ NA*1
f12 ~ NA*1
f3 ~ NA*1
"
fitMC2way <- sem(model1, data = dat2wayMC, meanstructure = TRUE)
result2wayMC <- probe2WayMC(fitMC2way, nameX = c("f1", "f2", "f12"),
nameY = "f3", modVar = "f2", valProbe = c(-1, 0, 1))
plotProbe(result2wayMC, xlim = c(-2, 2))
dat3wayMC <- indProd(dat3way, 1:3, 4:6, 7:9)
model3 <- "
f1 =~ x1 + x2 + x3
f2 =~ x4 + x5 + x6
f3 =~ x7 + x8 + x9
f12 =~ x1.x4 + x2.x5 + x3.x6
f13 =~ x1.x7 + x2.x8 + x3.x9
f23 =~ x4.x7 + x5.x8 + x6.x9
f123 =~ x1.x4.x7 + x2.x5.x8 + x3.x6.x9
f4 =~ x10 + x11 + x12
f4 ~ f1 + f2 + f3 + f12 + f13 + f23 + f123
f1 ~~ 0*f12
f1 ~~ 0*f13
f1 ~~ 0*f123
f2 ~~ 0*f12
f2 ~~ 0*f23
f2 ~~ 0*f123
f3 ~~ 0*f13
f3 ~~ 0*f23
f3 ~~ 0*f123
f12 ~~ 0*f123
f13 ~~ 0*f123
f23 ~~ 0*f123
x1 ~ 0*1
x4 ~ 0*1
x7 ~ 0*1
x10 ~ 0*1
x1.x4 ~ 0*1
x1.x7 ~ 0*1
x4.x7 ~ 0*1
x1.x4.x7 ~ 0*1
f1 ~ NA*1
f2 ~ NA*1
f3 ~ NA*1
f12 ~ NA*1
f13 ~ NA*1
f23 ~ NA*1
f123 ~ NA*1
f4 ~ NA*1
"
fitMC3way <- sem(model3, data = dat3wayMC, std.lv = FALSE,
meanstructure = TRUE)
result3wayMC <- probe3WayMC(fitMC3way, nameX = c("f1", "f2", "f3", "f12",
"f13", "f23", "f123"),
nameY = "f4", modVar = c("f1", "f2"),
valProbe1 = c(-1, 0, 1), valProbe2 = c(-1, 0, 1))
plotProbe(result3wayMC, xlim = c(-2, 2))
Plot the sampling distributions of RMSEA
Description
Plots the sampling distributions of RMSEA based on the noncentral chi-square distributions
Usage
plotRMSEAdist(rmsea, n, df, ptile = NULL, caption = NULL,
rmseaScale = TRUE, group = 1)
Arguments
rmsea |
The vector of RMSEA values to be plotted |
n |
Sample size of a dataset |
df |
Model degrees of freedom |
ptile |
The percentile rank of the distribution of the first RMSEA that users wish to plot a vertical line in the resulting graph |
caption |
The name vector of each element of |
rmseaScale |
If |
group |
The number of group that is used to calculate RMSEA. |
Details
This function creates overlappling plots of the sampling distribution of
RMSEA based on noncentral \chi^2 distribution (MacCallum, Browne, &
Suguwara, 1996). First, the noncentrality parameter (\lambda) is
calculated from RMSEA (Steiger, 1998; Dudgeon, 2004) by
\lambda = (N -
1)d\varepsilon^2 / K,
where N is sample size, d is the model
degree of freedom, K is the number of group, and \varepsilon is
the population RMSEA. Next, the noncentral \chi^2 distribution with a
specified df and noncentrality parameter is plotted. Thus,
the x-axis represents the sample \chi^2 value. The sample \chi^2
value can be transformed to the sample RMSEA scale (\hat{\varepsilon})
by
\hat{\varepsilon} = \sqrt{K}\sqrt{\frac{\chi^2 - d}{(N - 1)d}},
where \chi^2 is the \chi^2 value obtained from the noncentral
\chi^2 distribution.
Author(s)
Sunthud Pornprasertmanit (psunthud@gmail.com)
References
Dudgeon, P. (2004). A note on extending Steiger's (1998) multiple sample RMSEA adjustment to other noncentrality parameter-based statistic. Structural Equation Modeling, 11(3), 305–319. doi:10.1207/s15328007sem1103_1
MacCallum, R. C., Browne, M. W., & Sugawara, H. M. (1996). Power analysis and determination of sample size for covariance structure modeling. Psychological Methods, 1(2), 130–149. doi:10.1037/1082-989X.1.2.130
Steiger, J. H. (1998). A note on multiple sample extensions of the RMSEA fit index. Structural Equation Modeling, 5(4), 411–419. doi:10.1080/10705519809540115
See Also
-
plotRMSEApower()to plot the statistical power based on population RMSEA given the sample size -
findRMSEApower()to find the statistical power based on population RMSEA given a sample size -
findRMSEAsamplesize()to find the minium sample size for a given statistical power based on population RMSEA
Examples
plotRMSEAdist(c(.05, .08), n = 200, df = 20, ptile = .95, rmseaScale = TRUE)
plotRMSEAdist(c(.05, .01), n = 200, df = 20, ptile = .05, rmseaScale = FALSE)
Plot power curves for RMSEA
Description
Plots power of RMSEA over a range of sample sizes
Usage
plotRMSEApower(rmsea0, rmseaA, df, nlow, nhigh, steps = 1, alpha = 0.05,
group = 1, ...)
Arguments
rmsea0 |
Null RMSEA |
rmseaA |
Alternative RMSEA |
df |
Model degrees of freedom |
nlow |
Lower sample size |
nhigh |
Upper sample size |
steps |
Increase in sample size for each iteration. Smaller values of steps will lead to more precise plots. However, smaller step sizes means a longer run time. |
alpha |
Alpha level used in power calculations |
group |
The number of group that is used to calculate RMSEA. |
... |
The additional arguments for the plot function. |
Details
This function creates plot of power for RMSEA against a range of sample sizes. The plot places sample size on the horizontal axis and power on the vertical axis. The user should indicate the lower and upper values for sample size and the sample size between each estimate ("step size") We strongly urge the user to read the sources below (see References) before proceeding. A web version of this function is available at: https://quantpsy.org/rmsea/rmseaplot.htm. This function is also implemented in the web application "power4SEM": https://sjak.shinyapps.io/power4SEM/
Value
Plot of power for RMSEA against a range of sample sizes
Author(s)
Alexander M. Schoemann (East Carolina University; schoemanna@ecu.edu)
Kristopher J. Preacher (Vanderbilt University; kris.preacher@vanderbilt.edu)
Donna L. Coffman (Pennsylvania State University; dlc30@psu.edu)
References
MacCallum, R. C., Browne, M. W., & Cai, L. (2006). Testing differences between nested covariance structure models: Power analysis and null hypotheses. Psychological Methods, 11(1), 19–35. doi:10.1037/1082-989X.11.1.19
MacCallum, R. C., Browne, M. W., & Sugawara, H. M. (1996). Power analysis and determination of sample size for covariance structure modeling. Psychological Methods, 1(2), 130–149. doi:10.1037/1082-989X.1.2.130
MacCallum, R. C., Lee, T., & Browne, M. W. (2010). The issue of isopower in power analysis for tests of structural equation models. Structural Equation Modeling, 17(1), 23–41. doi:10.1080/10705510903438906
Preacher, K. J., Cai, L., & MacCallum, R. C. (2007). Alternatives to traditional model comparison strategies for covariance structure models. In T. D. Little, J. A. Bovaird, & N. A. Card (Eds.), Modeling contextual effects in longitudinal studies (pp. 33–62). Mahwah, NJ: Lawrence Erlbaum Associates.
Steiger, J. H. (1998). A note on multiple sample extensions of the RMSEA fit index. Structural Equation Modeling, 5(4), 411–419. doi:10.1080/10705519809540115
Steiger, J. H., & Lind, J. C. (1980, June). Statistically based tests for the number of factors. Paper presented at the annual meeting of the Psychometric Society, Iowa City, IA.
Jak, S., Jorgensen, T. D., Verdam, M. G., Oort, F. J., & Elffers, L. (2021). Analytical power calculations for structural equation modeling: A tutorial and Shiny app. Behavior Research Methods, 53, 1385–1406. doi:10.3758/s13428-020-01479-0
See Also
-
plotRMSEAdist()to visualize the RMSEA distributions -
findRMSEApower()to find the statistical power based on population RMSEA given a sample size -
findRMSEAsamplesize()to find the minium sample size for a given statistical power based on population RMSEA
Examples
plotRMSEApower(rmsea0 = .025, rmseaA = .075, df = 23,
nlow = 100, nhigh = 500, steps = 10)
Plot power of nested model RMSEA
Description
Plot power of nested model RMSEA over a range of possible sample sizes.
Usage
plotRMSEApowernested(rmsea0A = NULL, rmsea0B = NULL, rmsea1A,
rmsea1B = NULL, dfA, dfB, nlow, nhigh, steps = 1, alpha = 0.05,
group = 1, ...)
Arguments
rmsea0A |
The |
rmsea0B |
The |
rmsea1A |
The |
rmsea1B |
The |
dfA |
degree of freedom of the more-restricted model |
dfB |
degree of freedom of the less-restricted model |
nlow |
Lower bound of sample size |
nhigh |
Upper bound of sample size |
steps |
Step size |
alpha |
The alpha level |
group |
The number of group in calculating RMSEA |
... |
The additional arguments for the plot function. |
Author(s)
Bell Clinton
Pavel Panko (Texas Tech University; pavel.panko@ttu.edu)
Sunthud Pornprasertmanit (psunthud@gmail.com)
References
MacCallum, R. C., Browne, M. W., & Cai, L. (2006). Testing differences between nested covariance structure models: Power analysis and null hypotheses. Psychological Methods, 11(1), 19–35. doi:10.1037/1082-989X.11.1.19
See Also
-
findRMSEApowernested()to find the power for a given sample size in nested model comparison based on population RMSEA -
findRMSEAsamplesizenested()to find the minium sample size for a given statistical power in nested model comparison based on population RMSEA
Examples
plotRMSEApowernested(rmsea0A = 0, rmsea0B = 0, rmsea1A = 0.06,
rmsea1B = 0.05, dfA = 22, dfB = 20, nlow = 50,
nhigh = 500, steps = 1, alpha = .05, group = 1)
Combine sampling variability with parcel-allocation variability by pooling results across M parcel-allocations
Description
This function employs an iterative algorithm to pick the number of random
item-to-parcel allocations needed to meet user-defined stability criteria
for a fitted structural equation model (SEM) (see Details below for
more information). Pooled point and standard-error estimates from this SEM
can be outputted at this final selected number of allocations (however, it
is more efficient to save the allocations and treat them as multiple
imputations using lavaan.mi::lavaan.mi(); see See Also for links with
examples). Additionally, new indices (see Sterba & Rights, 2016) are
outputted for assessing the relative contributions of parcel-allocation
variability vs. sampling variability in each estimate. At each iteration,
this function generates a given number of random item-to-parcel allocations,
fits a SEM to each allocation, pools estimates across allocations from that
iteration, and then assesses whether stopping criteria are met. If stopping
criteria are not met, the algorithm increments the number of allocations
used (generating all new allocations).
Usage
poolMAlloc(nPerPar, facPlc, nAllocStart, nAllocAdd = 0,
parceloutput = NULL, syntax, dataset, stopProp, stopValue,
selectParam = NULL, indices = "default", double = FALSE,
checkConv = FALSE, names = "default", leaveout = 0,
useTotalAlloc = FALSE, ...)
Arguments
nPerPar |
A list in which each element is a vector, corresponding to each factor, indicating sizes of parcels. If variables are left out of parceling, they should not be accounted for here (i.e., there should not be parcels of size "1"). |
facPlc |
A list of vectors, each corresponding to a factor, specifying the item indicators of that factor (whether included in parceling or not). Either variable names or column numbers. Variables not listed will not be modeled or included in output datasets. |
nAllocStart |
The number of random allocations of items to parcels to generate in the first iteration of the algorithm. |
nAllocAdd |
The number of allocations to add with each iteration of the
algorithm. Note that if only one iteration is desired, |
parceloutput |
Optional |
syntax |
lavaan syntax that defines the model. |
dataset |
Item-level dataset |
stopProp |
Value used in defining stopping criteria of the algorithm
( |
stopValue |
Value used in defining stopping criteria of the algorithm
( |
selectParam |
(Optional) A list of the pooled parameters to be used in
defining stopping criteria (i.e., |
indices |
Optional |
double |
(Optional) If set to |
checkConv |
(Optional) If set to TRUE, function will output pooled estimates and standard errors from 10 iterations post-convergence. |
names |
(Optional) A character vector containing the names of parceled variables. |
leaveout |
(Optional) A vector of variables to be left out of randomized parceling. Either variable names or column numbers are allowed. |
useTotalAlloc |
(Optional) If set to |
... |
Additional arguments to be passed to
|
Details
This function implements an algorithm for choosing the number of allocations (M; described in Sterba & Rights, 2016), pools point and standard-error estimates across these M allocations, and produces indices for assessing the relative contributions of parcel-allocation variability vs. sampling variability in each estimate.
To obtain pooled test statistics for model fit or model comparison, the
list or parcel allocations can be passed to lavaan.mi::lavaan.mi()
(find Examples on the help pages for parcelAllocation()
and PAVranking()).
This function randomly generates a given number (nAllocStart) of
item-to-parcel allocations, fits a SEM to each allocation, and then
increments the number of allocations used (by nAllocAdd) until the
pooled point and standard-error estimates fulfill stopping criteria
(stopProp and stopValue, defined above). A summary of results
from the model that was fit to the M allocations are returned.
Additionally, this function outputs the proportion of allocations with solutions that converged (using a maximum likelihood estimator) as well as the proportion of allocations with solutions that were converged and proper. The converged and proper solutions among the final M allocations are used in computing pooled results.
Additionally, after each iteration of the algorithm, information useful in monitoring the algorithm is outputted. The number of allocations used at that iteration, the proportion of pooled parameter estimates meeting stopping criteria at the previous iteration, the proportion of pooled standard errors meeting stopping criteria at the previous iteration, and the runtime of that iteration are outputted. When stopping criteria are satisfied, the full set of results are outputted.
For further details on the benefits of the random allocation of items to parcels, see Sterba (2011) and Sterba & MacCallum (2010).
Value
Estimates |
A table containing pooled results across M
allocations at the iteration where stopping criteria were met. Columns
correspond to individual parameter name, pooled estimate, pooled standard
error, p value for a z test of the parameter, normal-theory |
Fit |
A table containing results related to model fit from the M allocations at the iteration where stopping criteria were met. Columns correspond to fit index names, the mean of each index across allocations, the SD of each fit index across allocations, the minimum, maximum and range of each fit index across allocations, and the percent of the M allocations where the chi-square test of absolute fit was significant. |
Proportions |
A table containing the proportion of the final M allocations that (a) met the optimizer convergence criteria) and (b) converged to proper solutions. Note that pooled estimates, pooled standard errors, and other results are computed using only the converged, proper allocations. |
Stability |
The number of allocations (M) needed for stability, at which point the algorithm's stopping criteria (defined above) were met. |
Uncertainty |
Indices used to quantify uncertainty in estimates due to sample vs. allocation variability. A table containing individual parameter names, an estimate of the proportion of total variance of a pooled parameter estimate that is attributable to parcel-allocation variability (PPAV), and an estimate of the ratio of the between-allocation variance of a pooled parameter estimate to the within-allocation variance (RPAV). See Sterba & Rights (2016) for more detail. |
Time |
The total runtime of the function, in minutes. Note that the
total runtime will be greater when the specified model encounters
convergence problems for some allocations, as is the case with the
|
Author(s)
Jason D. Rights (Vanderbilt University; jason.d.rights@vanderbilt.edu)
The author would also like to credit Corbin Quick and Alexander Schoemann
for providing the original parcelAllocation() function (prior to its
revision by Terrence D. Jorgensen) on which this function is based.
References
Sterba, S. K. (2011). Implications of parcel-allocation variability for comparing fit of item-solutions and parcel-solutions. Structural Equation Modeling, 18(4), 554–577. doi:10.1080/10705511.2011.607073
Sterba, S. K., & MacCallum, R. C. (2010). Variability in parameter estimates and model fit across random allocations of items to parcels. Multivariate Behavioral Research, 45(2), 322–358. doi:10.1080/00273171003680302
Sterba, S. K., & Rights, J. D. (2016). Accounting for parcel-allocation variability in practice: Combining sources of uncertainty and choosing the number of allocations. Multivariate Behavioral Research, 51(2–3), 296–313. doi:10.1080/00273171.2016.1144502
Sterba, S. K., & Rights, J. D. (2017). Effects of parceling on model selection: Parcel-allocation variability in model ranking. Psychological Methods, 22(1), 47–68. doi:10.1037/met0000067
See Also
lavaan.mi::lavaan.mi() for treating allocations as multiple imputations
to pool results across allocations. See Examples on help pages for
parcelAllocation() (when fitting a single model) and PAVranking()
(when comparing 2 models).
Examples
## lavaan syntax: A 2 Correlated
## factor CFA model to be fit to parceled data
parmodel <- '
f1 =~ NA*p1f1 + p2f1 + p3f1
f2 =~ NA*p1f2 + p2f2 + p3f2
p1f1 ~ 1
p2f1 ~ 1
p3f1 ~ 1
p1f2 ~ 1
p2f2 ~ 1
p3f2 ~ 1
p1f1 ~~ p1f1
p2f1 ~~ p2f1
p3f1 ~~ p3f1
p1f2 ~~ p1f2
p2f2 ~~ p2f2
p3f2 ~~ p3f2
f1 ~~ 1*f1
f2 ~~ 1*f2
f1 ~~ f2
'
## specify items for each factor
f1name <- colnames(simParcel)[1:9]
f2name <- colnames(simParcel)[10:18]
## run function
poolMAlloc(nPerPar = list(c(3,3,3), c(3,3,3)),
facPlc = list(f1name, f2name), nAllocStart = 10, nAllocAdd = 10,
syntax = parmodel, dataset = simParcel, stopProp = .03,
stopValue = .03, selectParam = c(1:6, 13:18, 21),
names = list("p1f1","p2f1","p3f1","p1f2","p2f2","p3f2"),
double = FALSE, useTotalAlloc = FALSE)
## See examples on ?parcelAllocation and ?PAVranking for how to obtain
## pooled test statistics and other pooled lavaan output.
## Details provided in Sterba & Rights (2016).
Probing two-way interaction on the no-centered or mean-centered latent interaction
Description
Probing interaction for simple intercept and simple slope for the no-centered or mean-centered latent two-way interaction
Usage
probe2WayMC(fit, nameX, nameY, modVar, valProbe, group = 1L,
omit.imps = c("no.conv", "no.se"))
Arguments
fit |
A fitted lavaan::lavaan or lavaan.mi::lavaan.mi object with a latent 2-way interaction. |
nameX |
|
nameY |
The name of factor that is used as the dependent variable. |
modVar |
The name of factor that is used as a moderator. The effect of
the other independent factor will be probed at each value of the
moderator variable listed in |
valProbe |
The values of the moderator that will be used to probe the effect of the focal predictor. |
group |
In multigroup models, the label of the group for which the
results will be returned. Must correspond to one of
|
omit.imps |
|
Details
Before using this function, researchers need to make the products of the
indicators between the first-order factors using mean centering (Marsh, Wen,
& Hau, 2004). Note that the double-mean centering may not be appropriate for
probing interaction if researchers are interested in simple intercepts. The
mean or double-mean centering can be done by the indProd()
function. The indicator products can be made for all possible combination or
matched-pair approach (Marsh et al., 2004). Next, the hypothesized model
with the regression with latent interaction will be used to fit all original
indicators and the product terms. See the example for how to fit the product
term below. Once the lavaan result is obtained, this function will be used
to probe the interaction.
Let that the latent interaction model regressing the dependent variable
(Y) on the independent variable (X) and the moderator (Z)
be
Y = b_0 + b_1X + b_2Z + b_3XZ + r,
where b_0 is the
estimated intercept or the expected value of Y when both X and
Z are 0, b_1 is the effect of X when Z is 0,
b_2 is the effect of Z when X is 0, b_3 is the
interaction effect between X and Z, and r is the residual
term.
To probe a two-way interaction, the simple intercept of the independent variable at each value of the moderator (Aiken & West, 1991; Cohen, Cohen, West, & Aiken, 2003; Preacher, Curran, & Bauer, 2006) can be obtained by
b_{0|X = 0, Z} = b_0 + b_2 Z.
The simple slope of the independent varaible at each value of the moderator can be obtained by
b_{X|Z} = b_1 + b_3 Z.
The variance of the simple intercept formula is
Var\left(b_{0|X = 0, Z}\right) =
Var\left(b_0\right) + 2Z \times Cov\left(b_0, b_2\right) +
Z^2 \times Var\left(b_2\right)
,
where Var denotes the variance of a parameter
estimate and Cov denotes the covariance of two parameter estimates.
The variance of the simple slope formula is
Var\left(b_{X|Z}\right) = Var\left(b_1\right) + 2Z \times
Cov\left(b_1, b_3\right) + Z^2 \times Var\left(b_3\right)
Wald z statistic is used for test statistic (even for objects of class lavaan.mi::lavaan.mi).
Value
A list with two elements:
-
SimpleIntercept: The simple intercepts given each value of the moderator. -
SimpleSlope: The simple slopes given each value of the moderator.
In each element, the first column represents the values of the moderator
specified in the valProbe argument. The second column is the simple
intercept or simple slope. The third column is the SE of the simple
intercept or simple slope. The fourth column is the Wald (z)
statistic, and the fifth column is the associated p value testing
the null hypothesis that each simple intercept or slope is 0.
Author(s)
Sunthud Pornprasertmanit (psunthud@gmail.com)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
References
Tutorial:
Schoemann, A. M., & Jorgensen, T. D. (2021). Testing and interpreting
latent variable interactions using the semTools package.
Psych, 3(3), 322–335. doi:10.3390/psych3030024
Background literature:
Aiken, L. S., & West, S. G. (1991). Multiple regression: Testing and interpreting interactions. Newbury Park, CA: Sage.
Cohen, J., Cohen, P., West, S. G., & Aiken, L. S. (2003). Applied multiple regression/correlation analysis for the behavioral sciences (3rd ed.). New York, NY: Routledge.
Marsh, H. W., Wen, Z., & Hau, K. T. (2004). Structural equation models of latent interactions: Evaluation of alternative estimation strategies and indicator construction. Psychological Methods, 9(3), 275–300. doi:10.1037/1082-989X.9.3.275
Preacher, K. J., Curran, P. J., & Bauer, D. J. (2006). Computational tools for probing interactions in multiple linear regression, multilevel modeling, and latent curve analysis. Journal of Educational and Behavioral Statistics, 31(4), 437–448. doi:10.3102/10769986031004437
See Also
-
indProd()For creating the indicator products with no centering, mean centering, double-mean centering, or residual centering. -
probe3WayMC()For probing the three-way latent interaction when the results are obtained from mean-centering, or double-mean centering -
probe2WayRC()For probing the two-way latent interaction when the results are obtained from residual-centering approach. -
probe3WayRC()For probing the two-way latent interaction when the results are obtained from residual-centering approach. -
plotProbe()Plot the simple intercepts and slopes of the latent interaction.
Examples
dat2wayMC <- indProd(dat2way, 1:3, 4:6) # double mean centered by default
model1 <- "
f1 =~ x1 + x2 + x3
f2 =~ x4 + x5 + x6
f12 =~ x1.x4 + x2.x5 + x3.x6
f3 =~ x7 + x8 + x9
f3 ~ f1 + f2 + f12
f12 ~~ 0*f1 + 0*f2 # not necessary, but implied by double mean centering
"
fitMC2way <- sem(model1, data = dat2wayMC, meanstructure = TRUE)
summary(fitMC2way)
probe2WayMC(fitMC2way, nameX = c("f1", "f2", "f12"), nameY = "f3",
modVar = "f2", valProbe = c(-1, 0, 1))
## can probe multigroup models, one group at a time
dat2wayMC$g <- 1:2
model2 <- "
f1 =~ x1 + x2 + x3
f2 =~ x4 + x5 + x6
f12 =~ x1.x4 + x2.x5 + x3.x6
f3 =~ x7 + x8 + x9
f3 ~ c(b1.g1, b1.g2)*f1 + c(b2.g1, b2.g2)*f2 + c(b12.g1, b12.g2)*f12
f12 ~~ 0*f1 + 0*f2
"
fit2 <- sem(model2, data = dat2wayMC, group = "g")
probe2WayMC(fit2, nameX = c("f1", "f2", "f12"), nameY = "f3",
modVar = "f2", valProbe = c(-1, 0, 1)) # group = 1 by default
probe2WayMC(fit2, nameX = c("f1", "f2", "f12"), nameY = "f3",
modVar = "f2", valProbe = c(-1, 0, 1), group = 2)
Probing two-way interaction on the residual-centered latent interaction
Description
Probing interaction for simple intercept and simple slope for the residual-centered latent two-way interaction (Geldhof et al., 2013)
Usage
probe2WayRC(fit, nameX, nameY, modVar, valProbe, group = 1L,
omit.imps = c("no.conv", "no.se"))
Arguments
fit |
A fitted lavaan::lavaan or lavaan.mi::lavaan.mi object with a latent 2-way interaction. |
nameX |
|
nameY |
The name of factor that is used as the dependent variable. |
modVar |
The name of factor that is used as a moderator. The effect of
the other independent factor will be probed at each value of the
moderator variable listed in |
valProbe |
The values of the moderator that will be used to probe the effect of the focal predictor. |
group |
In multigroup models, the label of the group for which the
results will be returned. Must correspond to one of
|
omit.imps |
|
Details
Before using this function, researchers need to make the products of the
indicators between the first-order factors and residualize the products by
the original indicators (Lance, 1988; Little, Bovaird, & Widaman, 2006). The
process can be automated by the indProd() function. Note that
the indicator products can be made for all possible combination or
matched-pair approach (Marsh et al., 2004). Next, the hypothesized model
with the regression with latent interaction will be used to fit all original
indicators and the product terms. To use this function the model must be fit
with a mean structure. See the example for how to fit the product term
below. Once the lavaan result is obtained, this function will be used to
probe the interaction.
The probing process on residual-centered latent interaction is based on
transforming the residual-centered result into the no-centered result. See
Geldhof et al. (2013) for further details. Note that this approach is based
on a strong assumption that the first-order latent variables are normally
distributed. The probing process is applied after the no-centered result
(parameter estimates and their covariance matrix among parameter estimates)
has been computed. See the probe2WayMC() for further details.
Value
A list with two elements:
-
SimpleIntercept: The simple intercepts given each value of the moderator. -
SimpleSlope: The simple slopes given each value of the moderator.
In each element, the first column represents the values of the moderators
specified in the valProbe argument. The second column is the simple
intercept or simple slope. The third column is the standard error of the
simple intercept or slope. The fourth column is the Wald (z)
statistic, and the fifth column is the associated p value testing
the null hypothesis that each simple intercept or slope is 0.
Author(s)
Sunthud Pornprasertmanit (psunthud@gmail.com)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
References
Tutorial:
Schoemann, A. M., & Jorgensen, T. D. (2021). Testing and interpreting
latent variable interactions using the semTools package.
Psych, 3(3), 322–335. doi:10.3390/psych3030024
Background literature:
Lance, C. E. (1988). Residual centering, exploratory and confirmatory moderator analysis, and decomposition of effects in path models containing interactions. Applied Psychological Measurement, 12(2), 163–175. doi:10.1177/014662168801200205
Little, T. D., Bovaird, J. A., & Widaman, K. F. (2006). On the merits of orthogonalizing powered and product terms: Implications for modeling interactions. Structural Equation Modeling, 13(4), 497–519. doi:10.1207/s15328007sem1304_1
Marsh, H. W., Wen, Z., & Hau, K. T. (2004). Structural equation models of latent interactions: Evaluation of alternative estimation strategies and indicator construction. Psychological Methods, 9(3), 275–300. doi:10.1037/1082-989X.9.3.275
Geldhof, G. J., Pornprasertmanit, S., Schoemann, A. M., & Little, T. D. (2013). Orthogonalizing through residual centering: Extended applications and caveats. Educational and Psychological Measurement, 73(1), 27–46. doi:10.1177/0013164412445473
See Also
-
indProd()For creating the indicator products with no centering, mean centering, double-mean centering, or residual centering. -
probe2WayMC()For probing the two-way latent interaction when the results are obtained from mean-centering, or double-mean centering -
probe3WayMC()For probing the three-way latent interaction when the results are obtained from mean-centering, or double-mean centering -
probe3WayRC()For probing the two-way latent interaction when the results are obtained from residual-centering approach. -
plotProbe()Plot the simple intercepts and slopes of the latent interaction.
Examples
dat2wayRC <- orthogonalize(dat2way, 1:3, 4:6)
model1 <- "
f1 =~ x1 + x2 + x3
f2 =~ x4 + x5 + x6
f12 =~ x1.x4 + x2.x5 + x3.x6
f3 =~ x7 + x8 + x9
f3 ~ f1 + f2 + f12
f12 ~~ 0*f1 + 0*f2
x1 + x4 + x1.x4 + x7 ~ 0*1 # identify latent means
f1 + f2 + f12 + f3 ~ NA*1
"
fitRC2way <- sem(model1, data = dat2wayRC, meanstructure = TRUE)
summary(fitRC2way)
probe2WayRC(fitRC2way, nameX = c("f1", "f2", "f12"), nameY = "f3",
modVar = "f2", valProbe = c(-1, 0, 1))
## can probe multigroup models, one group at a time
dat2wayRC$g <- 1:2
model2 <- "
f1 =~ x1 + x2 + x3
f2 =~ x4 + x5 + x6
f12 =~ x1.x4 + x2.x5 + x3.x6
f3 =~ x7 + x8 + x9
f3 ~ c(b1.g1, b1.g2)*f1 + c(b2.g1, b2.g2)*f2 + c(b12.g1, b12.g2)*f12
f12 ~~ 0*f1 + 0*f2
x1 + x4 + x1.x4 + x7 ~ 0*1 # identify latent means
f1 + f2 + f12 ~ NA*1
f3 ~ NA*1 + c(b0.g1, b0.g2)*1
"
fit2 <- sem(model2, data = dat2wayRC, group = "g")
probe2WayRC(fit2, nameX = c("f1", "f2", "f12"), nameY = "f3",
modVar = "f2", valProbe = c(-1, 0, 1)) # group = 1 by default
probe2WayRC(fit2, nameX = c("f1", "f2", "f12"), nameY = "f3",
modVar = "f2", valProbe = c(-1, 0, 1), group = 2)
Probing three-way interaction on the no-centered or mean-centered latent interaction
Description
Probing interaction for simple intercept and simple slope for the no-centered or mean-centered latent two-way interaction
Usage
probe3WayMC(fit, nameX, nameY, modVar, valProbe1, valProbe2, group = 1L,
omit.imps = c("no.conv", "no.se"))
Arguments
fit |
A fitted lavaan::lavaan or lavaan.mi::lavaan.mi object with a latent 2-way interaction. |
nameX |
|
nameY |
The name of factor that is used as the dependent variable. |
modVar |
The name of two factors that are used as the moderators. The effect of the independent factor will be probed at each combination of the moderator variables' chosen values. |
valProbe1 |
The values of the first moderator that will be used to probe the effect of the independent factor. |
valProbe2 |
The values of the second moderator that will be used to probe the effect of the independent factor. |
group |
In multigroup models, the label of the group for which the
results will be returned. Must correspond to one of
|
omit.imps |
|
Details
Before using this function, researchers need to make the products of the
indicators between the first-order factors using mean centering (Marsh, Wen,
& Hau, 2004). Note that the double-mean centering may not be appropriate for
probing interaction if researchers are interested in simple intercepts. The
mean or double-mean centering can be done by the indProd()
function. The indicator products can be made for all possible combination or
matched-pair approach (Marsh et al., 2004). Next, the hypothesized model
with the regression with latent interaction will be used to fit all original
indicators and the product terms. See the example for how to fit the product
term below. Once the lavaan result is obtained, this function will be used
to probe the interaction.
Let that the latent interaction model regressing the dependent variable
(Y) on the independent variable (X) and two moderators (Z
and W) be
Y = b_0 + b_1X + b_2Z + b_3W + b_4XZ + b_5XW + b_6ZW
+ b_7XZW + r,
where b_0 is the estimated intercept or the expected
value of Y when X, Z, and W are 0, b_1 is the
effect of X when Z and W are 0, b_2 is the effect of
Z when X and W is 0, b_3 is the effect of W
when X and Z are 0, b_4 is the interaction effect between
X and Z when W is 0, b_5 is the interaction effect
between X and W when Z is 0, b_6 is the interaction
effect between Z and W when X is 0, b_7 is the
three-way interaction effect between X, Z, and W, and
r is the residual term.
To probe a three-way interaction, the simple intercept of the independent variable at the specific values of the moderators (Aiken & West, 1991) can be obtained by
b_{0|X = 0, Z, W} = b_0 + b_2Z + b_3W + b_6ZW.
The simple slope of the independent variable at the specific values of the moderators can be obtained by
b_{X|Z, W} = b_1 + b_3Z + b_4W + b_7ZW.
The variance of the simple intercept formula is
Var\left(b_{0|X = 0, Z, W}\right) =
Var\left(b_0\right) + Z^2Var\left(b_2\right) + W^2Var\left(b_3\right) +
Z^2W^2Var\left(b_6\right)
+ 2ZCov\left(b_0, b_2\right) + 2WCov\left(b_0, b_3\right) +
2ZWCov\left(b_0, b_6\right) + 2ZWCov\left(b_2, b_3\right) +
2Z^2WCov\left(b_2, b_6\right) + 2ZW^2Cov\left(b_3, b_6\right),
where Var denotes the variance of a parameter estimate and Cov
denotes the covariance of two parameter estimates.
The variance of the simple slope formula is
Var\left(b_{X|Z, W}\right) =
Var\left(b_1\right) + Z^2Var\left(b_4\right) + W^2Var\left(b_5\right)
+ Z^2W^2Var\left(b_7\right)
+ 2ZCov\left(b_1, b_4\right) + 2WCov\left(b_1, b_5\right) +
2ZWCov\left(b_1, b_7\right) + 2ZWCov\left(b_4, b_5\right) +
2Z^2WCov\left(b_4, b_7\right) + 2ZW^2Cov\left(b_5, b_7\right).
Wald z statistics are calculated (even for objects of class lavaan.mi::lavaan.mi) to test null hypotheses that simple intercepts or slopes are 0.
Value
A list with two elements:
-
SimpleIntercept: The model-implied intercepts given each combination of moderator values. -
SimpleSlope: The model-implied slopes given each combination of moderator values.
In each element, the first column represents values of the first moderator
specified in the valProbe1 argument. The second column represents
values of the second moderator specified in the valProbe2 argument.
The third column is the simple intercept or simple slope. The fourth column
is the standard error of the simple intercept or simple slope. The fifth
column is the Wald (z) statistic, and the sixth column is its
associated p value to test the null hypothesis that each simple
intercept or simple slope equals 0.
Author(s)
Sunthud Pornprasertmanit (psunthud@gmail.com)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
References
Tutorial:
Schoemann, A. M., & Jorgensen, T. D. (2021). Testing and interpreting
latent variable interactions using the semTools package.
Psych, 3(3), 322–335. doi:10.3390/psych3030024
Background literature:
Aiken, L. S., & West, S. G. (1991). Multiple regression: Testing and interpreting interactions. Newbury Park, CA: Sage.
Marsh, H. W., Wen, Z., & Hau, K. T. (2004). Structural equation models of latent interactions: Evaluation of alternative estimation strategies and indicator construction. Psychological Methods, 9(3), 275–300. doi:10.1037/1082-989X.9.3.275
See Also
-
indProd()For creating the indicator products with no centering, mean centering, double-mean centering, or residual centering. -
probe2WayMC()For probing the two-way latent interaction when the results are obtained from mean-centering, or double-mean centering -
probe2WayRC()For probing the two-way latent interaction when the results are obtained from residual-centering approach. -
probe3WayRC()For probing the two-way latent interaction when the results are obtained from residual-centering approach. -
plotProbe()Plot the simple intercepts and slopes of the latent interaction.
Examples
dat3wayMC <- indProd(dat3way, 1:3, 4:6, 7:9)
model3 <- " ## define latent variables
f1 =~ x1 + x2 + x3
f2 =~ x4 + x5 + x6
f3 =~ x7 + x8 + x9
## 2-way interactions
f12 =~ x1.x4 + x2.x5 + x3.x6
f13 =~ x1.x7 + x2.x8 + x3.x9
f23 =~ x4.x7 + x5.x8 + x6.x9
## 3-way interaction
f123 =~ x1.x4.x7 + x2.x5.x8 + x3.x6.x9
## outcome variable
f4 =~ x10 + x11 + x12
## latent regression model
f4 ~ b1*f1 + b2*f2 + b3*f3 + b12*f12 + b13*f13 + b23*f23 + b123*f123
## orthogonal terms among predictors
## (not necessary, but implied by double mean centering)
f1 ~~ 0*f12 + 0*f13 + 0*f123
f2 ~~ 0*f12 + 0*f23 + 0*f123
f3 ~~ 0*f13 + 0*f23 + 0*f123
f12 + f13 + f23 ~~ 0*f123
"
fitMC3way <- sem(model3, data = dat3wayMC, meanstructure = TRUE)
summary(fitMC3way)
probe3WayMC(fitMC3way, nameX = c("f1" ,"f2" ,"f3",
"f12","f13","f23", # this order matters!
"f123"), # 3-way interaction
nameY = "f4", modVar = c("f1", "f2"),
valProbe1 = c(-1, 0, 1), valProbe2 = c(-1, 0, 1))
Probing three-way interaction on the residual-centered latent interaction
Description
Probing interaction for simple intercept and simple slope for the residual-centered latent three-way interaction (Geldhof et al., 2013)
Usage
probe3WayRC(fit, nameX, nameY, modVar, valProbe1, valProbe2, group = 1L,
omit.imps = c("no.conv", "no.se"))
Arguments
fit |
A fitted lavaan::lavaan or lavaan.mi::lavaan.mi object with a latent 2-way interaction. |
nameX |
|
nameY |
The name of factor that is used as the dependent variable. |
modVar |
The name of two factors that are used as the moderators. The effect of the independent factor on each combination of the moderator variable values will be probed. |
valProbe1 |
The values of the first moderator that will be used to probe the effect of the independent factor. |
valProbe2 |
The values of the second moderator that will be used to probe the effect of the independent factor. |
group |
In multigroup models, the label of the group for which the
results will be returned. Must correspond to one of
|
omit.imps |
|
Details
Before using this function, researchers need to make the products of the
indicators between the first-order factors and residualize the products by
the original indicators (Lance, 1988; Little, Bovaird, & Widaman, 2006). The
process can be automated by the indProd() function. Note that
the indicator products can be made for all possible combination or
matched-pair approach (Marsh et al., 2004). Next, the hypothesized model
with the regression with latent interaction will be used to fit all original
indicators and the product terms (Geldhof et al., 2013). To use this
function the model must be fit with a mean structure. See the example for
how to fit the product term below. Once the lavaan result is obtained, this
function will be used to probe the interaction.
The probing process on residual-centered latent interaction is based on
transforming the residual-centered result into the no-centered result. See
Geldhof et al. (2013) for further details. Note that this approach based on
a strong assumption that the first-order latent variables are normally
distributed. The probing process is applied after the no-centered result
(parameter estimates and their covariance matrix among parameter estimates)
has been computed. See the probe3WayMC() for further details.
Value
A list with two elements:
-
SimpleIntercept: The model-implied intercepts given each combination of moderator values. -
SimpleSlope: The model-implied slopes given each combination of moderator values.
In each element, the first column represents values of the first moderator
specified in the valProbe1 argument. The second column represents
values of the second moderator specified in the valProbe2 argument.
The third column is the simple intercept or simple slope. The fourth column
is the SE of the simple intercept or simple slope. The fifth column
is the Wald (z) statistic, and the sixth column is its associated
p value to test the null hypothesis that each simple intercept or
simple slope equals 0.
Author(s)
Sunthud Pornprasertmanit (psunthud@gmail.com)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
References
Tutorial:
Schoemann, A. M., & Jorgensen, T. D. (2021). Testing and interpreting
latent variable interactions using the semTools package.
Psych, 3(3), 322–335. doi:10.3390/psych3030024
Background literature:
Geldhof, G. J., Pornprasertmanit, S., Schoemann, A., & Little, T. D. (2013). Orthogonalizing through residual centering: Extended applications and caveats. Educational and Psychological Measurement, 73(1), 27–46. doi:10.1177/0013164412445473
Lance, C. E. (1988). Residual centering, exploratory and confirmatory moderator analysis, and decomposition of effects in path models containing interactions. Applied Psychological Measurement, 12(2), 163–175. doi:10.1177/014662168801200205
Little, T. D., Bovaird, J. A., & Widaman, K. F. (2006). On the merits of orthogonalizing powered and product terms: Implications for modeling interactions. Structural Equation Modeling, 13(4), 497–519. doi:10.1207/s15328007sem1304_1
Marsh, H. W., Wen, Z., & Hau, K. T. (2004). Structural equation models of latent interactions: Evaluation of alternative estimation strategies and indicator construction. Psychological Methods, 9(3), 275–300. doi:10.1037/1082-989X.9.3.275
Pornprasertmanit, S., Schoemann, A. M., Geldhof, G. J., & Little, T. D. (submitted). Probing latent interaction estimated with a residual centering approach.
See Also
-
indProd()For creating the indicator products with no centering, mean centering, double-mean centering, or residual centering. -
probe2WayMC()For probing the two-way latent interaction when the results are obtained from mean-centering, or double-mean centering -
probe3WayMC()For probing the three-way latent interaction when the results are obtained from mean-centering, or double-mean centering -
probe2WayRC()For probing the two-way latent interaction when the results are obtained from residual-centering approach. -
plotProbe()Plot the simple intercepts and slopes of the latent interaction.
Examples
dat3wayRC <- orthogonalize(dat3way, 1:3, 4:6, 7:9)
model3 <- " ## define latent variables
f1 =~ x1 + x2 + x3
f2 =~ x4 + x5 + x6
f3 =~ x7 + x8 + x9
## 2-way interactions
f12 =~ x1.x4 + x2.x5 + x3.x6
f13 =~ x1.x7 + x2.x8 + x3.x9
f23 =~ x4.x7 + x5.x8 + x6.x9
## 3-way interaction
f123 =~ x1.x4.x7 + x2.x5.x8 + x3.x6.x9
## outcome variable
f4 =~ x10 + x11 + x12
## latent regression model
f4 ~ b1*f1 + b2*f2 + b3*f3 + b12*f12 + b13*f13 + b23*f23 + b123*f123
## orthogonal terms among predictors
f1 ~~ 0*f12 + 0*f13 + 0*f123
f2 ~~ 0*f12 + 0*f23 + 0*f123
f3 ~~ 0*f13 + 0*f23 + 0*f123
f12 + f13 + f23 ~~ 0*f123
## identify latent means
x1 + x4 + x7 + x1.x4 + x1.x7 + x4.x7 + x1.x4.x7 + x10 ~ 0*1
f1 + f2 + f3 + f12 + f13 + f23 + f123 + f4 ~ NA*1
"
fitRC3way <- sem(model3, data = dat3wayRC, meanstructure = TRUE)
summary(fitRC3way)
probe3WayMC(fitRC3way, nameX = c("f1" ,"f2" ,"f3",
"f12","f13","f23", # this order matters!
"f123"), # 3-way interaction
nameY = "f4", modVar = c("f1", "f2"),
valProbe1 = c(-1, 0, 1), valProbe2 = c(-1, 0, 1))
Quark
Description
The quark function provides researchers with the ability to calculate
and include component scores calculated by taking into account the variance
in the original dataset and all of the interaction and polynomial effects of
the data in the dataset.
Usage
quark(data, id, order = 1, silent = FALSE, ...)
Arguments
data |
The data frame is a required component for |
id |
Identifiers and dates within the dataset will need to be
acknowledged as |
order |
Order is an optional argument provided by quark that can be
used when the imputation procedures in mice fail. Under some circumstances,
mice cannot calculate missing values due to issues with extreme missingness.
Should an error present itself stating a failure due to not having any
columns selected, set the argument |
silent |
If |
... |
additional arguments to pass to |
Details
The quark function calculates these component scores by first filling
in the data via means of multiple imputation methods and then expanding the
dataset by aggregating the non-overlapping interaction effects between
variables by calculating the mean of the interactions and polynomial
effects. The multiple imputation methods include one of iterative sampling
and group mean substitution and multiple imputation using a polytomous
regression algorithm (mice). During the expansion process, the dataset is
expanded to three times its normal size (in width). The first third of the
dataset contains all of the original data post imputation, the second third
contains the means of the polynomial effects (squares and cubes), and the
final third contains the means of the non-overlapping interaction effects. A
full principal componenent analysis is conducted and the individual
components are retained. The subsequent combinequark() function
provides researchers the control in determining how many components to
extract and retain. The function returns the dataset as submitted (with
missing values) and the component scores as requested for a more accurate
multiple imputation in subsequent steps.
Value
The output value from using the quark function is a list. It will return a list with 7 components.
ID Columns |
Is a vector of the identifier columns entered when running quark. |
ID Variables |
Is a subset of the dataset that contains the identifiers as acknowledged when running quark. |
Used Data |
Is a matrix / dataframe of the data provided by user as the basis for quark to process. |
Imputed Data |
Is a matrix / dataframe of the data after the multiple method imputation process. |
Big Matrix |
Is the expanded product and polynomial matrix. |
Principal Components |
Is the entire dataframe of principal components for the dataset. This dataset will have the same number of rows of the big matrix, but will have 1 less column (as is the case with principal component analyses). |
Percent Variance Explained |
Is a vector of the percent variance explained with each column of principal components. |
Author(s)
Steven R. Chesnut (University of Southern Mississippi; Steven.Chesnut@usm.edu)
Danny Squire (Texas Tech University)
Terrence D. Jorgensen (University of Amsterdam)
The PCA code is copied and modified from the FactoMineR package.
References
Howard, W. J., Rhemtulla, M., & Little, T. D. (2015). Using Principal Components as Auxiliary Variables in Missing Data Estimation. Multivariate Behavioral Research, 50(3), 285–299. doi:10.1080/00273171.2014.999267
See Also
Examples
set.seed(123321)
dat <- HolzingerSwineford1939[,7:15]
misspat <- matrix(runif(nrow(dat) * 9) < 0.3, nrow(dat))
dat[misspat] <- NA
dat <- cbind(HolzingerSwineford1939[,1:3], dat)
quark.list <- quark(data = dat, id = c(1, 2))
final.data <- combinequark(quark = quark.list, percent = 80)
## Example to rerun quark after imputation failure:
quark.list <- quark(data = dat, id = c(1, 2), order = 2)
Composite Reliability using SEM
Description
Calculate composite reliability from estimated factor-model parameters
Usage
reliability(object, what = c("alpha", "omega", "omega2", "omega3", "ave"),
return.total = FALSE, dropSingle = TRUE, omit.factors = character(0),
omit.indicators = character(0), omit.imps = c("no.conv", "no.se"))
Arguments
object |
A lavaan::lavaan or lavaan.mi::lavaan.mi object, expected to contain only exogenous common factors (i.e., a CFA model). |
what |
|
return.total |
|
dropSingle |
|
omit.factors |
|
omit.indicators |
|
omit.imps |
|
Details
The coefficient alpha (Cronbach, 1951) can be calculated by
\alpha = \frac{k}{k - 1}\left[ 1 - \frac{\sum^{k}_{i = 1}
\sigma_{ii}}{\sum^{k}_{i = 1} \sigma_{ii} + 2\sum_{i < j} \sigma_{ij}}
\right],
where k is the number of items in a factor, \sigma_{ii} is the
item i observed variances, \sigma_{ij} is the observed
covariance of items i and j.
Several coefficients for factor-analysis reliability have been termed
"omega", which Cho (2021) argues is a misleading misnomer and argues for
using \rho to represent them all, differentiated by descriptive
subscripts. In our package, we number \omega based on commonly
applied calculations. Bentler (1968) first introduced factor-analysis
reliability for a unidimensional factor model with congeneric indicators.
However, assuming there are no cross-loadings in a multidimensional CFA,
this reliability coefficient can be calculated for each factor in the model.
\omega_1 =\frac{\left( \sum^{k}_{i = 1} \lambda_i \right)^{2}
Var\left( \psi \right)}{\left( \sum^{k}_{i = 1} \lambda_i \right)^{2}
Var\left( \psi \right) + \sum^{k}_{i = 1} \theta_{ii} + 2\sum_{i < j}
\theta_{ij} },
where \lambda_i is the factor loading of item i, \psi is
the factor variance, \theta_{ii} is the variance of measurement errors
of item i, and \theta_{ij} is the covariance of measurement
errors from item i and j. McDonald (1999) later referred to
this and other reliability coefficients as "omega", which is a source
of confusion when reporting coefficients (Cho, 2021).
The additional coefficients generalize the first formula by accounting for
multidimenisionality (possibly with cross-loadings) and correlated errors.
By setting return.total=TRUE, one can estimate reliability for a
single composite calculated using all indicators in the multidimensional
CFA (Bentler, 1972, 2009). "omega2" is calculated by
\omega_2 = \frac{\left( \sum^{k}_{i = 1} \lambda_i \right)^{2}
Var\left( \psi \right)}{\bold{1}^\prime \hat{\Sigma} \bold{1}},
where \hat{\Sigma} is the model-implied covariance matrix, and
\bold{1} is the k-dimensional vector of 1. The first and the
second coefficients omega will have the same value per factor in models with
simple structure, but they differ when there are (e.g.) cross-loadings
or method factors. The first coefficient omega can be viewed as the
reliability controlling for the other factors (like \eta^2_{partial} in
ANOVA). The second coefficient omega can be viewed as the unconditional
reliability (like \eta^2 in ANOVA).
The "omega3" coefficient (McDonald, 1999), sometimes referred to as
hierarchical omega, can be calculated by
\omega_3 =\frac{\left( \sum^{k}_{i = 1} \lambda_i \right)^{2}
Var\left( \psi \right)}{\bold{1}^\prime \Sigma \bold{1}},
where \Sigma is the observed covariance matrix. If the model fits the
data well, \omega_3 will be similar to \omega_2. Note that if
there is a directional effect in the model, all coefficients are calculated
from total factor variances: lavInspect(object, "cov.lv").
In conclusion, \omega_1, \omega_2, and \omega_3 are
different in the denominator. The denominator of the first formula assumes
that a model is congeneric factor model where measurement errors are not
correlated. The second formula accounts for correlated measurement errors.
However, these two formulas assume that the model-implied covariance matrix
explains item relationships perfectly. The residuals are subject to sampling
error. The third formula use observed covariance matrix instead of
model-implied covariance matrix to calculate the observed total variance.
This formula is the most conservative method in calculating coefficient
omega.
The average variance extracted (AVE) can be calculated by
AVE = \frac{\bold{1}^\prime
\textrm{diag}\left(\Lambda\Psi\Lambda^\prime\right)\bold{1}}{\bold{1}^\prime
\textrm{diag}\left(\hat{\Sigma}\right) \bold{1}},
Note that this formula is modified from Fornell & Larcker (1981) in the case that factor variances are not 1. The proposed formula from Fornell & Larcker (1981) assumes that the factor variances are 1. Note that AVE will not be provided for factors consisting of items with dual loadings. AVE is the property of items but not the property of factors. AVE is calculated with polychoric correlations when ordinal indicators are used.
Coefficient alpha is by definition applied by treating indicators as numeric
(see Chalmers, 2018), which is consistent with the alpha function in
the psych package. When indicators are ordinal, reliability
additionally applies the standard alpha calculation to the polychoric
correlation matrix to return Zumbo et al.'s (2007) "ordinal alpha".
Coefficient omega for categorical items is calculated using Green and Yang's
(2009, formula 21) approach. Three types of coefficient omega indicate
different methods to calculate item total variances. The original formula
from Green and Yang is equivalent to \omega_3 in this function.
Green and Yang did not propose a method for
calculating reliability with a mixture of categorical and continuous
indicators, and we are currently unaware of an appropriate method.
Therefore, when reliability detects both categorical and continuous
indicators of a factor, an error is returned. If the categorical indicators
load on a different factor(s) than continuous indicators, then reliability
will still be calculated separately for those factors, but
return.total must be FALSE (unless omit.factors is used
to isolate factors with indicators of the same type).
Value
Reliability values (coefficient alpha, coefficients omega, average
variance extracted) of each factor in each group. If there are multiple
factors, a total column can optionally be included.
Author(s)
Sunthud Pornprasertmanit (psunthud@gmail.com)
Yves Rosseel (Ghent University; Yves.Rosseel@UGent.be)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
References
Bentler, P. M. (1972). A lower-bound method for the dimension-free measurement of internal consistency. Social Science Research, 1(4), 343–357. doi:10.1016/0049-089X(72)90082-8
Bentler, P. M. (2009). Alpha, dimension-free, and model-based internal consistency reliability. Psychometrika, 74(1), 137–143. doi:10.1007/s11336-008-9100-1
Chalmers, R. P. (2018). On misconceptions and the limited usefulness of ordinal alpha. Educational and Psychological Measurement, 78(6), 1056–1071. doi:10.1177/0013164417727036
Cho, E. (2021) Neither Cronbach’s alpha nor McDonald’s omega: A commentary on Sijtsma and Pfadt. Psychometrika, 86(4), 877–886. doi:10.1007/s11336-021-09801-1
Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16(3), 297–334. doi:10.1007/BF02310555
Fornell, C., & Larcker, D. F. (1981). Evaluating structural equation models with unobservable variables and measurement errors. Journal of Marketing Research, 18(1), 39–50. doi:10.2307/3151312
Green, S. B., & Yang, Y. (2009). Reliability of summed item scores using structural equation modeling: An alternative to coefficient alpha. Psychometrika, 74(1), 155–167. doi:10.1007/s11336-008-9099-3
McDonald, R. P. (1999). Test theory: A unified treatment. Mahwah, NJ: Erlbaum.
Raykov, T. (2001). Estimation of congeneric scale reliability using covariance structure analysis with nonlinear constraints British Journal of Mathematical and Statistical Psychology, 54(2), 315–323. doi:10.1348/000711001159582
Zumbo, B. D., Gadermann, A. M., & Zeisser, C. (2007). Ordinal versions of coefficients alpha and theta for Likert rating scales. Journal of Modern Applied Statistical Methods, 6(1), 21–29. doi:10.22237/jmasm/1177992180
Zumbo, B. D., & Kroc, E. (2019). A measurement is a choice and Stevens’ scales of measurement do not help make it: A response to Chalmers. Educational and Psychological Measurement, 79(6), 1184–1197. doi:10.1177/0013164419844305
See Also
Examples
data(HolzingerSwineford1939)
HS9 <- HolzingerSwineford1939[ , c("x7","x8","x9")]
HSbinary <- as.data.frame( lapply(HS9, cut, 2, labels=FALSE) )
names(HSbinary) <- c("y7","y8","y9")
HS <- cbind(HolzingerSwineford1939, HSbinary)
HS.model <- ' visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ y7 + y8 + y9 '
fit <- cfa(HS.model, data = HS, ordered = c("y7","y8","y9"), std.lv = TRUE)
## works for factors with exclusively continuous OR categorical indicators
reliability(fit)
## reliability for ALL indicators only available when they are
## all continuous or all categorical
reliability(fit, omit.factors = "speed", return.total = TRUE)
## loop over visual indicators to calculate alpha if one indicator is removed
for (i in paste0("x", 1:3)) {
cat("Drop x", i, ":\n")
print(reliability(fit, omit.factors = c("textual","speed"),
omit.indicators = i, what = "alpha"))
}
## works for multigroup models and for multilevel models (and both)
data(Demo.twolevel)
## assign clusters to arbitrary groups
Demo.twolevel$g <- ifelse(Demo.twolevel$cluster %% 2L, "type1", "type2")
model2 <- ' group: type1
level: within
fac =~ y1 + L2*y2 + L3*y3
level: between
fac =~ y1 + L2*y2 + L3*y3
group: type2
level: within
fac =~ y1 + L2*y2 + L3*y3
level: between
fac =~ y1 + L2*y2 + L3*y3
'
fit2 <- sem(model2, data = Demo.twolevel, cluster = "cluster", group = "g")
reliability(fit2, what = c("alpha","omega3"))
Calculate the reliability values of a second-order factor
Description
Calculate the reliability values (coefficient omega) of a second-order factor
Usage
reliabilityL2(object, secondFactor, omit.imps = c("no.conv", "no.se"))
Arguments
object |
A lavaan::lavaan or lavaan.mi::lavaan.mi object, expected to contain a least one exogenous higher-order common factor. |
secondFactor |
The name of a single second-order factor in the
model fitted in |
omit.imps |
|
Details
The first formula of the coefficient omega (in the
reliability()) will be mainly used in the calculation. The
model-implied covariance matrix of a second-order factor model can be
separated into three sources: the second-order common-factor variance,
the residual variance of the first-order common factors (i.e., not
accounted for by the second-order factor), and the measurement error of
observed indicators:
\hat{\Sigma} = \Lambda \bold{B} \Phi_2 \bold{B}^{\prime}
\Lambda^{\prime} + \Lambda \Psi_{u} \Lambda^{\prime} + \Theta,
where \hat{\Sigma} is the model-implied covariance matrix,
\Lambda contains first-order factor loadings, \bold{B} contains
second-order factor loadings, \Phi_2 is the covariance matrix of the
second-order factor(s), \Psi_{u} is the covariance matrix of residuals
from first-order factors, and \Theta is the covariance matrix of the
measurement errors from observed indicators. Thus, we can calculate the
proportion of variance of a composite score calculated from the observed
indicators (e.g., a total score or scale mean) that is attributable to the
second-order factor, i.e. coefficient omega at Level 1:
\omega_{L1} = \frac{\bold{1}^{\prime} \Lambda \bold{B} \Phi_2
\bold{B}^{\prime} \Lambda^{\prime} \bold{1}}{\bold{1}^{\prime} \Lambda
\bold{B} \Phi_2 \bold{B} ^{\prime} \Lambda^{\prime} \bold{1} +
\bold{1}^{\prime} \Lambda \Psi_{u} \Lambda^{\prime} \bold{1} +
\bold{1}^{\prime} \Theta \bold{1}},
where \bold{1} is the k-dimensional vector of 1 and k is
the number of observed variables.
The model-implied covariance matrix among first-order factors (\Phi_1)
can be calculated as:
\Phi_1 = \bold{B} \Phi_2 \bold{B}^{\prime} + \Psi_{u},
Thus, the proportion of variance among first-order common factors that is attributable to the second-order factor (i.e., coefficient omega at Level 2) can be calculated as:
\omega_{L2} = \frac{\bold{1}_F^{\prime} \bold{B} \Phi_2
\bold{B}^{\prime} \bold{1}_F}{\bold{1}_F^{\prime} \bold{B} \Phi_2
\bold{B}^{\prime} \bold{1}_F + \bold{1}_F^{\prime} \Psi_{u} \bold{1}_F},
where \bold{1}_F is the F-dimensional vector of 1 and F
is the number of first-order factors. This Level-2 omega can be interpreted
as an estimate of the reliability of a hypothetical composite calculated
from error-free observable variables representing the first-order common
factors. This might only be meaningful as a thought experiment.
An additional thought experiment is possible: If the observed indicators contained only the second-order common-factor variance and unsystematic measurement error, then there would be no first-order common factors because their unique variances would be excluded from the observed measures. An estimate of this hypothetical composite reliability can be calculated as the partial coefficient omega at Level 1, or the proportion of observed variance explained by the second-order factor after partialling out the uniqueness from the first-order factors:
\omega_{L1} = \frac{\bold{1}^{\prime} \Lambda \bold{B} \Phi_2
\bold{B}^{\prime} \Lambda^{\prime} \bold{1}}{\bold{1}^{\prime} \Lambda
\bold{B} \Phi_2 \bold{B}^{\prime} \Lambda^{\prime} \bold{1} +
\bold{1}^{\prime} \Theta \bold{1}},
Note that if the second-order factor has a direct factor loading on some observed variables, the observed variables will be counted as first-order factors, which might not be desirable.
Value
Reliability values at Levels 1 and 2 of the second-order factor, as well as the partial reliability value at Level 1
Author(s)
Sunthud Pornprasertmanit (psunthud@gmail.com)
See Also
Examples
HS.model3 <- ' visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9
higher =~ visual + textual + speed'
fit6 <- cfa(HS.model3, data = HolzingerSwineford1939)
reliability(fit6) # Should provide a warning for the endogenous variables
reliabilityL2(fit6, "higher")
Residual-center all target indicators by covariates
Description
This function will regress target variables on the covariate and replace the target variables by the residual of the regression analysis. This procedure is useful to control the covariate from the analysis model (Geldhof, Pornprasertmanit, Schoemann, & Little, 2013).
Usage
residualCovariate(data, targetVar, covVar)
Arguments
data |
The desired data to be transformed. |
targetVar |
Varible names or the position of indicators that users wish to be residual centered (as dependent variables) |
covVar |
Covariate names or the position of the covariates using for residual centering (as independent variables) onto target variables |
Value
The data that the target variables replaced by the residuals
Author(s)
Sunthud Pornprasertmanit (psunthud@gmail.com)
References
Geldhof, G. J., Pornprasertmanit, S., Schoemann, A. M., & Little, T. D. (2013). Orthogonalizing through residual centering: Extended applications and caveats. Educational and Psychological Measurement, 73(1), 27–46. doi:10.1177/0013164412445473
See Also
indProd() For creating the indicator products with no
centering, mean centering, double-mean centering, or residual centering.
Examples
dat <- residualCovariate(attitude, 2:7, 1)
Deprecated functions in package semTools.
Description
The functions listed below are deprecated and will be defunct
in the near future. When possible, alternative functions with similar
functionality are also mentioned. Help pages for deprecated functions are
available at help("semTools-deprecated").
Usage
reliability(object, what = c("alpha", "omega", "omega2", "omega3", "ave"),
return.total = FALSE, dropSingle = TRUE, omit.factors = character(0),
omit.indicators = character(0), omit.imps = c("no.conv", "no.se"))
reliabilityL2(object, secondFactor, omit.imps = c("no.conv", "no.se"))
Reliability
The original reliability function was suboptimally designed.
For example, AVE was returned, which is not a reliability index. Also,
alpha and several omega-type coefficients were returned, including the
original formula that was inappropriate for models with complex structure.
Some features could be controlled by the user for one but not both types of
index For example, alpha for categorical indicators was returned on both
the observed and latent-response scales, but this was not an option for any
omega-type indices. The omegas differed in terms of whether the observed or
model-implied covariance matrix was used in the denominator, but alpha was
only computed using the observed matrix. These inconsistencies have been
resolved in the new compRelSEM() function, which returns only
one reliability index per composite, tailored to the user's requested
features, for which there is much more flexibility. The average variance
extracted is now available in a dedicated AVE() function.
Higher-Order Reliability
Originally, composite reliability of a single higher-order factor was
estimated in a separate function: reliabilityL2. It is now available
for multiple higher-order factors in the same model, and from the same
compRelSEM() function that estimates reliability for first-order
factors, using the higher= argument to name higher-order factors in
the fitted model.
Simulated Data set to Demonstrate Random Allocations of Parcels
Description
A simulated data set with 2 factors with 9 indicators for each factor
Usage
simParcel
Format
A data.frame with 800 observations of 18 variables.
- f1item1
Item 1 loading on factor 1
- f1item2
Item 2 loading on factor 1
- f1item3
Item 3 loading on factor 1
- f1item4
Item 4 loading on factor 1
- f1item5
Item 5 loading on factor 1
- f1item6
Item 6 loading on factor 1
- f1item7
Item 7 loading on factor 1
- f1item8
Item 8 loading on factor 1
- f1item9
Item 9 loading on factor 1
- f2item1
Item 1 loading on factor 2
- f2item2
Item 2 loading on factor 2
- f2item3
Item 3 loading on factor 2
- f2item4
Item 4 loading on factor 2
- f2item5
Item 5 loading on factor 2
- f2item6
Item 6 loading on factor 2
- f2item7
Item 7 loading on factor 2
- f2item8
Item 8 loading on factor 2
- f2item9
Item 9 loading on factor 2
Source
Data were generated using the simsem package.
Examples
head(simParcel)
Single Parameter Test Divided from Nested Model Comparison
Description
In comparing two nested models, \Delta\chi^2 test may indicate that
two models are different. However, like other omnibus tests, researchers do
not know which fixed parameters or constraints make these two models
different. This function will help researchers identify the significant
parameter.
Usage
singleParamTest(model1, model2, return.fit = FALSE,
method = "satorra.bentler.2001")
Arguments
model1 |
Model 1. |
model2 |
Model 2. Note that two models must be nested models. Further, the order of parameters in their parameter tables are the same. That is, nested models with different scale identifications may not be able to test by this function. |
return.fit |
Return the submodels fitted by this function |
method |
The method used to calculate likelihood ratio test. See
|
Details
This function first identifies the differences between these two models. The model with more free parameters is referred to as parent model and the model with fewer free parameters is referred to as nested model. Two tests are implemented here:
-
free: The nested model is used as a template. Then, one parameter indicating the differences between two models is freed. The new model is compared with the nested model. This process is repeated for all differences between two models. fix: The parent model is used as a template. Then, one parameter indicating the differences between two models is fixed or constrained to be equal to other parameters. The new model is then compared with the parent model. This process is repeated for all differences between two models.mi: No longer available because the test of modification indices is not consistent. For example, if two parameters are equality constrained, the modification index from the first parameter is not equal to the second parameter.
Note that this function does not adjust for the inflated Type I error rate from multiple tests.
Value
If return.fit = FALSE, the result tables are provided.
\chi^2 and p value are provided for all methods. Note that the
\chi^2 is all based on 1 df. Expected parameter changes
and their standardized forms are also provided.
If return.fit = TRUE, a list with two elements are provided. The
first element is the tabular result. The second element is the submodels
used in the free and fix methods.
Author(s)
Sunthud Pornprasertmanit (psunthud@gmail.com)
Examples
library(lavaan)
# Nested model comparison by hand
HS.model1 <- ' visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6'
HS.model2 <- ' visual =~ a*x1 + a*x2 + a*x3
textual =~ b*x4 + b*x5 + b*x6'
m1 <- cfa(HS.model1, data = HolzingerSwineford1939, std.lv = TRUE,
estimator = "MLR")
m2 <- cfa(HS.model2, data = HolzingerSwineford1939, std.lv = TRUE,
estimator = "MLR")
anova(m1, m2)
singleParamTest(m1, m2)
## Nested models to test measurement invariance
HW.model <- ' visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9 '
mod.config <- cfa(model = HW.model, data = HolzingerSwineford1939,
group = "school")
mod.metric <- cfa(model = HW.model, data = HolzingerSwineford1939,
group = "school", group.equal = "loadings")
singleParamTest(mod.config, mod.metric)
Finding skewness
Description
Finding skewness (g_{1}) of an object
Usage
skew(object, population = FALSE)
Arguments
object |
A vector used to find a skewness |
population |
|
Details
The skewness computed by default is g_{1}, the third standardized
moment of the empirical distribution of object.
The population parameter skewness \gamma_{1} formula is
\gamma_{1} = \frac{\mu_{3}}{\mu^{3/2}_{2}},
where \mu_{i} denotes the i order central moment.
The skewness formula for sample statistic g_{1} is
g_{1} = \frac{k_{3}}{k^{2}_{2}},
where k_{i} are the i order k-statistic.
The standard error of the skewness is
Var(\hat{g}_1) = \frac{6}{N}
where N is the sample size.
Value
A value of a skewness with a test statistic if the population is
specified as FALSE
Author(s)
Sunthud Pornprasertmanit (psunthud@gmail.com)
References
Weisstein, Eric W. (n.d.). Skewness. Retrieved from MathWorld–A Wolfram Web Resource: https://mathworld.wolfram.com/Skewness.html
See Also
-
kurtosis()Find the univariate excessive kurtosis of a variable -
mardiaSkew()Find Mardia's multivariate skewness of a set of variables -
mardiaKurtosis()Find the Mardia's multivariate kurtosis of a set of variables
Examples
skew(1:5)
Randomly Split a Data Set into Halves
Description
This function randomly splits a data set into two halves, and saves the resulting data sets to the same folder as the original.
Usage
splitSample(dataset, path = "default", div = 2, type = "default",
name = "splitSample")
Arguments
dataset |
The original data set to be divided. Can be a file path to a
*.csv or *.dat file (headers will automatically be detected) or an R object
(matrix or dataframe). (Windows users: file path must be specified using
FORWARD SLASHES ( |
path |
File path to folder for output data sets. NOT REQUIRED if dataset is a filename. Specify ONLY if dataset is an R object, or desired output folder is not that of original data set. If path is specified as "object", output data sets will be returned as a list, and not saved to hard drive. |
div |
Number of output data sets. NOT REQUIRED if default, 2 halves. |
type |
Output file format ("dat" or "csv"). NOT REQUIRED unless desired output formatting differs from that of input, or dataset is an R object and csv formatting is desired. |
name |
Output file name. NOT REQUIRED unless desired output name differs from that of input, or input dataset is an R object. (If input is an R object and name is not specified, name will be "splitSample".) |
Details
This function randomly orders the rows of a data set, divides the data set into two halves, and saves the halves to the same folder as the original data set, preserving the original formatting. Data set type (*.csv or .dat) and formatting (headers) are automatically detected, and output data sets will preserve input type and formatting unless specified otherwise. Input can be in the form of a file path (.dat or *.csv), or an R object (matrix or dataframe). If input is an R object and path is default, output data sets will be returned as a list object.
Value
If path = "object", list of output data sets.
Otherwise, output will saved to hard drive in the same format as input.
Author(s)
Corbin Quick (University of Michigan; corbinq@umich.edu)
Examples
#### Input is .dat file
#splitSample("C:/Users/Default/Desktop/MYDATA.dat")
#### Output saved to "C:/Users/Default/Desktop/" in .dat format
#### Names are "MYDATA_s1.dat" and "MYDATA_s2.dat"
#### Input is R object
## Split C02 dataset from the datasets package
library(datasets)
splitMyData <- splitSample(CO2, path = "object")
summary(splitMyData[[1]])
summary(splitMyData[[2]])
#### Output object splitMyData becomes list of output data sets
#### Input is .dat file in "C:/" folder
#splitSample("C:/testdata.dat", path = "C:/Users/Default/Desktop/", type = "csv")
#### Output saved to "C:/Users/Default/Desktop/" in *.csv format
#### Names are "testdata_s1.csv" and "testdata_s2.csv"
#### Input is R object
#splitSample(myData, path = "C:/Users/Default/Desktop/", name = "splitdata")
#### Output saved to "C:/Users/Default/Desktop/" in *.dat format
#### Names are "splitdata_s1.dat" and "splitdata_s2.dat"
Tukey's WSD post-hoc test of means for unequal variance and sample size
Description
This function computes Tukey's WSD post hoc test of means when variances and sample sizes are not equal across groups. It can be used as a post hoc test when comparing latent means in multiple group SEM.
Usage
tukeySEM(m1, m2, var1, var2, n1, n2, ng)
Arguments
m1 |
Mean of group 1. |
m2 |
Mean of group 2. |
var1 |
Variance of group 1. |
var2 |
Variance of group 2. |
n1 |
Sample size of group 1. |
n2 |
Sample size of group 2. |
ng |
Total number of groups to be compared (i.e., the number of groups compared in the omnibus test). |
Details
After conducting an omnibus test of means across three of more groups, researchers often wish to know which sets of means differ at a particular Type I error rate. Tukey's WSD test holds the error rate stable across multiple comparisons of means. This function implements an adaptation of Tukey's WSD test from Maxwell & Delaney (2004), that allows variances and sample sizes to differ across groups.
Value
A vector with three elements:
-
q: The q statistic -
df: The degrees of freedom for the q statistic -
p: A p value based on the q statistic, df, and the total number of groups to be compared
Author(s)
Alexander M. Schoemann (East Carolina University; schoemanna@ecu.edu)
References
Maxwell, S. E., & Delaney, H. D. (2004). Designing experiments and analyzing data: A model comparison perspective (2nd ed.). Mahwah, NJ: Lawrence Erlbaum Associates.
Examples
## For a case where three groups have been compared:
## Group 1: mean = 3.91, var = 0.46, n = 246
## Group 2: mean = 3.96, var = 0.62, n = 465
## Group 3: mean = 2.94, var = 1.07, n = 64
## compare group 1 and group 2
tukeySEM(3.91, 3.96, 0.46, 0.62, 246, 425, 3)
## compare group 1 and group 3
tukeySEM(3.91, 2.94, 0.46, 1.07, 246, 64, 3)
## compare group 2 and group 3
tukeySEM(3.96, 2.94, 0.62, 1.07, 465, 64, 3)
Fit a lavaan model using 2-Stage Maximum Likelihood (TSML) estimation for missing data.
Description
This function automates 2-Stage Maximum Likelihood (TSML) estimation, optionally with auxiliary variables. Step 1 involves fitting a saturated model to the partially observed data set (to variables in the hypothesized model as well as auxiliary variables related to missingness). Step 2 involves fitting the hypothesized model to the model-implied means and covariance matrix (also called the "EM" means and covariance matrix) as if they were complete data. Step 3 involves correcting the Step-2 standard errors (SEs) and chi-squared statistic to account for additional uncertainty due to missing data (using information from Step 1; see References section for sources with formulas).
Usage
twostage(..., aux, fun, baseline.model = NULL)
lavaan.2stage(..., aux = NULL, baseline.model = NULL)
cfa.2stage(..., aux = NULL, baseline.model = NULL)
sem.2stage(..., aux = NULL, baseline.model = NULL)
growth.2stage(..., aux = NULL, baseline.model = NULL)
Arguments
... |
Arguments passed to the |
aux |
An optional character vector naming auxiliary variable(s) in
|
fun |
The character string naming the lavaan function used to fit the
Step-2 hypothesized model ( |
baseline.model |
An optional character string, specifying the lavaan
|
Details
All variables (including auxiliary variables) are treated as endogenous
varaibles in the Step-1 saturated model (fixed.x = FALSE), so data
are assumed continuous, although not necessarily multivariate normal
(dummy-coded auxiliary variables may be included in Step 1, but categorical
endogenous variables in the Step-2 hypothesized model are not allowed). To
avoid assuming multivariate normality, request se = "robust.huber.white". CAUTION: In addition to setting fixed.x = FALSE and conditional.x = FALSE in lavaan::lavaan(),
this function will automatically set meanstructure = TRUE,
estimator = "ML", missing = "fiml", and test = "standard". lavaan::lavaan()'s se option can only be
set to "standard" to assume multivariate normality or to
"robust.huber.white" to relax that assumption.
Value
The twostage object contains 3 fitted lavaan models (saturated, target/hypothesized, and baseline) as well as the names of auxiliary variables. None of the individual models provide the correct model results (except the point estimates in the target model are unbiased). Use the methods in twostage to extract corrected SEs and test statistics.
Author(s)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
References
Savalei, V., & Bentler, P. M. (2009). A two-stage approach to missing data: Theory and application to auxiliary variables. Structural Equation Modeling, 16(3), 477–497. doi:10.1080/10705510903008238
Savalei, V., & Falk, C. F. (2014). Robust two-stage approach outperforms robust full information maximum likelihood with incomplete nonnormal data. Structural Equation Modeling, 21(2), 280–302. doi:10.1080/10705511.2014.882692
See Also
Examples
## impose missing data for example
HSMiss <- HolzingerSwineford1939[ , c(paste("x", 1:9, sep = ""),
"ageyr","agemo","school")]
set.seed(12345)
HSMiss$x5 <- ifelse(HSMiss$x5 <= quantile(HSMiss$x5, .3), NA, HSMiss$x5)
age <- HSMiss$ageyr + HSMiss$agemo/12
HSMiss$x9 <- ifelse(age <= quantile(age, .3), NA, HSMiss$x9)
## specify CFA model from lavaan's ?cfa help page
HS.model <- '
visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9
'
## use ageyr and agemo as auxiliary variables
out <- cfa.2stage(model = HS.model, data = HSMiss, aux = c("ageyr","agemo"))
## two versions of a corrected chi-squared test results are shown
out
## see Savalei & Bentler (2009) and Savalei & Falk (2014) for details
## the summary additionally provides the parameter estimates with corrected
## standard errors, test statistics, and confidence intervals, along with
## any other options that can be passed to parameterEstimates()
summary(out, standardized = TRUE)
## use parameter labels to fit a more constrained model
modc <- '
visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + a*x8 + a*x9
'
outc <- cfa.2stage(model = modc, data = HSMiss, aux = c("ageyr","agemo"))
## use the anova() method to test this constraint
anova(out, outc)
## like for a single model, two corrected statistics are provided
Class for the Results of 2-Stage Maximum Likelihood (TSML) Estimation for Missing Data
Description
This class contains the results of 2-Stage Maximum Likelihood (TSML)
estimation for missing data. The summary, anova, vcov
methods return corrected SEs and test statistics. Other methods are
simply wrappers around the corresponding lavaan::lavaan
methods.
Usage
## S4 method for signature 'twostage'
show(object)
## S4 method for signature 'twostage'
summary(object, ...)
## S4 method for signature 'twostage'
anova(object, h1 = NULL, baseline = FALSE)
## S4 method for signature 'twostage'
nobs(object, type = c("ntotal", "ngroups",
"n.per.group", "norig", "patterns", "coverage"))
## S4 method for signature 'twostage'
coef(object, type = c("free", "user"))
## S4 method for signature 'twostage'
vcov(object, baseline = FALSE)
## S4 method for signature 'twostage'
fitted.values(object, model = c("target", "saturated",
"baseline"), type = "moments", labels = TRUE)
## S4 method for signature 'twostage'
fitted(object, model = c("target", "saturated",
"baseline"), type = "moments", labels = TRUE)
## S4 method for signature 'twostage'
residuals(object, type = c("raw", "cor", "normalized",
"standardized"))
## S4 method for signature 'twostage'
resid(object, type = c("raw", "cor", "normalized",
"standardized"))
Arguments
object |
An object of class |
... |
arguments passed to |
h1 |
An object of class |
baseline |
|
type |
The meaning of this argument varies depending on which method it
it used for. Find detailed descriptions in the Value section
under |
model |
|
labels |
|
Value
show |
|
summary |
|
anova |
|
nobs |
|
coef |
|
vcov |
|
fitted.values, fitted |
|
residuals, resid |
|
Slots
saturatedA fitted lavaan::lavaan object containing the saturated model results
targetA fitted lavaan::lavaan object containing the target/hypothesized model results
baselineA fitted lavaan::lavaan object containing the baseline/null model results
auxNamesA character string (potentially of
length == 0) of any auxiliary variable names, if used
Objects from the Class
Objects can be created via the
twostage() function.
Author(s)
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
See Also
Examples
# See the example from the twostage function