| Type: | Package |
| Title: | Spatial Bayesian Factor Analysis |
| Version: | 1.4.0 |
| Date: | 2025-09-30 |
| Description: | Implements a spatial Bayesian non-parametric factor analysis model with inference in a Bayesian setting using Markov chain Monte Carlo (MCMC). Spatial correlation is introduced in the columns of the factor loadings matrix using a Bayesian non-parametric prior, the probit stick-breaking process. Areal spatial data is modeled using a conditional autoregressive (CAR) prior and point-referenced spatial data is treated using a Gaussian process. The response variable can be modeled as Gaussian, probit, Tobit, or Binomial (using Polya-Gamma augmentation). Temporal correlation is introduced for the latent factors through a hierarchical structure and can be specified as exponential or first-order autoregressive. Full details of the package can be found in the accompanying vignette. Furthermore, the details of the package can be found in "Bayesian Non-Parametric Factor Analysis for Longitudinal Spatial Surfaces", by Berchuck et al (2019), <doi:10.1214/20-BA1253> in Bayesian Analysis. |
| License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
| Encoding: | UTF-8 |
| LazyData: | true |
| LazyDataCompression: | xz |
| RoxygenNote: | 7.3.2 |
| NeedsCompilation: | yes |
| Depends: | R (≥ 3.0.2) |
| Imports: | graphics, grDevices, msm (≥ 1.0.0), mvtnorm (≥ 1.0-0), pgdraw (≥ 1.0), Rcpp (≥ 0.12.9), stats, utils |
| Suggests: | coda, classInt, knitr, rmarkdown, womblR (≥ 1.0.3) |
| LinkingTo: | Rcpp, RcppArmadillo (≥ 0.7.500.0.0) |
| VignetteBuilder: | knitr |
| Language: | en-US |
| Author: | Samuel I. Berchuck [aut, cre] |
| Maintainer: | Samuel I. Berchuck <sib2@duke.edu> |
| Packaged: | 2025-09-30 12:39:45 UTC; sib2 |
| Repository: | CRAN |
| Date/Publication: | 2025-09-30 13:00:02 UTC |
spBFA
Description
spBFA is a package for Bayesian spatial factor analysis. A corresponding manuscript is forthcoming.
Author(s)
Samuel I. Berchuck sib2@duke.edu
Spatial factor analysis using a Bayesian hierarchical model.
Description
bfa_sp is a Markov chain Monte Carlo (MCMC) sampler for a Bayesian spatial factor analysis model. The spatial component is
introduced using a Probit stick-breaking process prior on the factor loadings. The model is implemented using a Bayesian hierarchical framework.
Usage
bfa_sp(
formula,
data,
dist,
time,
K,
L = Inf,
trials = NULL,
family = "normal",
temporal.structure = "exponential",
spatial.structure = "discrete",
starting = NULL,
hypers = NULL,
tuning = NULL,
mcmc = NULL,
seed = 54,
gamma.shrinkage = TRUE,
include.space = TRUE,
clustering = TRUE
)
Arguments
formula |
A |
data |
A required |
dist |
A |
time |
A |
K |
A scalar that indicates the dimension (i.e., quantity) of latent factors. |
L |
The number of latent clusters. If finite, a scalar indicating the number of clusters for each column of the factor loadings matrix. By default |
trials |
A variable in |
family |
Character string indicating the distribution of the observed data. Options
include: |
temporal.structure |
Character string indicating the temporal kernel. Options include:
|
spatial.structure |
Character string indicating the type of spatial process. Options include:
|
starting |
Either When |
hypers |
Either When
|
tuning |
Either When |
mcmc |
Either
|
seed |
An integer value used to set the seed for the random number generator (default = 54). |
gamma.shrinkage |
A logical indicating whether a gamma shrinkage process prior is used for the variances of the factor loadings columns. If FALSE, the hyperparameters (A1 and A2) indicate the shape and rate for a gamma prior on the precisions. Default is TRUE. |
include.space |
A logical indicating whether a spatial process should be included. Default is TRUE, however if FALSE the spatial correlation matrix
is fixed as an identity matrix. This specification overrides the |
clustering |
A logical indicating whether the Bayesian non-parametric process should be used, default is TRUE. If FALSE is specified each column is instead modeled with an independent spatial process. |
Details
Details of the underlying statistical model proposed by Berchuck et al. 2019. are forthcoming.
Value
bfa_sp returns a list containing the following objects
lambdaNKeep x (M x O x K)matrixof posterior samples for factor loadings matrixlambda. The labels for each column are Lambda_O_M_K.etaNKeep x (Nu x K)matrixof posterior samples for the latent factorseta. The labels for each column are Eta_Nu_K.betaNKeep x Pmatrixof posterior samples forbeta.sigma2NKeep x (M * (O - C))matrixof posterior samples for the variancessigma2. The labels for each column are Sigma2_O_M.kappaNKeep x ((O * (O + 1)) / 2)matrixof posterior samples forkappa. The columns have names that describe the samples within them. The row is listed first, e.g.,Kappa3_2refers to the entry in row3, column2.deltaNKeep x Kmatrixof posterior samples fordelta.tauNKeep x Kmatrixof posterior samples fortau.upsilonNKeep x ((K * (K + 1)) / 2)matrixof posterior samples forUpsilon. The columns have names that describe the samples within them. The row is listed first, e.g.,Upsilon3_2refers to the entry in row3, column2.psiNKeep x 1matrixof posterior samples forpsi.xiNKeep x (M x O x K)matrixof posterior samples for factor loadings cluster labelsxi. The labels for each column are Xi_O_M_K.rhoNKeep x 1matrixof posterior samples forrho.metropolis2 (or 1) x 3matrixof metropolis acceptance rates, updated tuners, and original tuners that result from the pilot adaptation.runtimeA
characterstring giving the runtime of the MCMC sampler.datobjA
listof data objects that are used in futurebfa_spfunctions and should be ignored by the user.dataugA
listof data augmentation objects that are used in futurebfa_spfunctions and should be ignored by the user.
References
Reference for Berchuck et al. 2019 is forthcoming.
Examples
###Load womblR for example visual field data
library(womblR)
###Format data for MCMC sampler
blind_spot <- c(26, 35) # define blind spot
VFSeries <- VFSeries[order(VFSeries$Location), ] # sort by location
VFSeries <- VFSeries[order(VFSeries$Visit), ] # sort by visit
VFSeries <- VFSeries[!VFSeries$Location %in% blind_spot, ] # remove blind spot locations
dat <- data.frame(Y = VFSeries$DLS / 10) # create data frame with scaled data
Time <- unique(VFSeries$Time) / 365 # years since baseline visit
W <- HFAII_Queen[-blind_spot, -blind_spot] # visual field adjacency matrix (data object from womblR)
M <- dim(W)[1] # number of locations
###Prior bounds for psi
TimeDist <- as.matrix(dist(Time))
BPsi <- log(0.025) / -min(TimeDist[TimeDist > 0])
APsi <- log(0.975) / -max(TimeDist)
###MCMC options
K <- 10 # number of latent factors
O <- 1 # number of spatial observation types
Hypers <- list(Sigma2 = list(A = 0.001, B = 0.001),
Kappa = list(SmallUpsilon = O + 1, BigTheta = diag(O)),
Delta = list(A1 = 1, A2 = 20),
Psi = list(APsi = APsi, BPsi = BPsi),
Upsilon = list(Zeta = K + 1, Omega = diag(K)))
Starting <- list(Sigma2 = 1,
Kappa = diag(O),
Delta = 2 * (1:K),
Psi = (APsi + BPsi) / 2,
Upsilon = diag(K))
Tuning <- list(Psi = 1)
MCMC <- list(NBurn = 1000, NSims = 1000, NThin = 2, NPilot = 5)
###Fit MCMC Sampler
reg.bfa_sp <- bfa_sp(Y ~ 0, data = dat, dist = W, time = Time, K = 10,
starting = Starting, hypers = Hypers, tuning = Tuning, mcmc = MCMC,
L = Inf,
family = "tobit",
trials = NULL,
temporal.structure = "exponential",
spatial.structure = "discrete",
seed = 54,
gamma.shrinkage = TRUE,
include.space = TRUE,
clustering = TRUE)
###Note that this code produces the pre-computed data object "reg.bfa_sp"
###More details can be found in the vignette.
diagnostics
Description
Calculates diagnostic metrics using output from the spBFA model.
Usage
diagnostics(
object,
diags = c("dic", "dinf", "waic"),
keepDeviance = FALSE,
keepPPD = FALSE,
Verbose = TRUE,
seed = 54
)
Arguments
object |
A |
diags |
A vector of character strings indicating the diagnostics to compute. Options include: Deviance Information Criterion ("dic"), d-infinity ("dinf") and Watanabe-Akaike information criterion ("waic"). At least one option must be included. Note: The probit model cannot compute the DIC or WAIC diagnostics due to computational issues with computing the multivariate normal CDF. |
keepDeviance |
A logical indicating whether the posterior deviance distribution is returned (default = FALSE). |
keepPPD |
A logical indicating whether the posterior predictive distribution at each observed location is returned (default = FALSE). |
Verbose |
A boolean logical indicating whether progress should be output (default = TRUE). |
seed |
An integer value used to set the seed for the random number generator (default = 54). |
Details
To assess model fit, DIC, d-infinity and WAIC are used. DIC is based on the deviance statistic and penalizes for the complexity of a model with an effective number of parameters estimate pD (Spiegelhalter et al 2002). The d-infinity posterior predictive measure is an alternative diagnostic tool to DIC, where d-infinity=P+G. The G term decreases as goodness of fit increases, and P, the penalty term, inflates as the model becomes over-fit, so small values of both of these terms and, thus, small values of d-infinity are desirable (Gelfand and Ghosh 1998). WAIC is invariant to parametrization and is asymptotically equal to Bayesian cross-validation (Watanabe 2010). WAIC = -2 * (lppd - p_waic_2). Where lppd is the log pointwise predictive density and p_waic_2 is the estimated effective number of parameters based on the variance estimator from Vehtari et al. 2016. (p_waic_1 is the mean estimator).
Value
diagnostics returns a list containing the diagnostics requested and
possibly the deviance and/or posterior predictive distribution objects.
Author(s)
Samuel I. Berchuck
References
Gelfand, A. E., & Ghosh, S. K. (1998). Model choice: a minimum posterior predictive loss approach. Biometrika, 1-11.
Spiegelhalter, D. J., Best, N. G., Carlin, B. P., & Van Der Linde, A. (2002). Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 64(4), 583-639.
Vehtari, A., Gelman, A., & Gabry, J. (2016). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing, 1-20.
Watanabe, S. (2010). Asymptotic equivalence of Bayes cross validation and widely applicable information criterion in singular learning theory. Journal of Machine Learning Research, 11(Dec), 3571-3594.
Examples
###Load pre-computed regression results
data(reg.bfa_sp)
###Compute and print diagnostics
diags <- diagnostics(reg.bfa_sp)
print(unlist(diags))
is.spBFA
Description
is.spBFA is a general test of an object being interpretable as a
spBFA object.
Usage
is.spBFA(x)
Arguments
x |
object to be tested. |
Details
The spBFA class is defined as the regression object that
results from the spBFA regression function.
Value
is.spBFA returns a logical, depending on whether the object is of class spBFA.
Examples
###Load pre-computed results
data(reg.bfa_sp)
###Test function
is.spBFA(reg.bfa_sp)
predict.spBFA
Description
Predicts future observations from the spBFA model.
Usage
## S3 method for class 'spBFA'
predict(
object,
NewTimes,
NewX = NULL,
NewTrials = NULL,
type = "temporal",
Verbose = TRUE,
seed = 54,
...
)
Arguments
object |
A |
NewTimes |
A numeric vector including desired time(s) points for prediction. |
NewX |
A matrix including covariates at times |
NewTrials |
An array indicating the trials for categorical predictions. The array must have dimension |
type |
A character string indicating the type of prediction, choices include "temporal" and "spatial". Spatial prediction has not been implemented yet. |
Verbose |
A boolean logical indicating whether progress should be output. |
seed |
An integer value used to set the seed for the random number generator (default = 54). |
... |
other arguments. |
Details
predict.spBFA uses Bayesian krigging to predict vectors at future
time points. The function returns the krigged factors (Eta) and also the observed outcomes (Y).
Value
predict.spBFA returns a list containing the following objects.
EtaA
listcontainingNNewVistismatrices, one for each new time prediction. Each matrix is dimensionNKeep x K, whereKis the number of latent factors Each matrix contains posterior samples obtained by Bayesian krigging.YA
listcontainingNNewVistisposterior predictive distribution matrices. Each matrix is dimensionNKeep x (M * O), whereMis the number of spatial locations andOthe number of observation types. Each matrix is obtained through Bayesian krigging.
Author(s)
Samuel I. Berchuck
Examples
###Load pre-computed regression results
data(reg.bfa_sp)
###Compute predictions
pred <- predict(reg.bfa_sp, NewTimes = 3)
pred.observations <- pred$Y$Y10 # observed data predictions
pred.krig <- pred$Eta$Eta10 # krigged parameters
Pre-computed regression results from bfa_sp
Description
The data object reg.bfa_sp is a pre-computed spBFA data object for use in the package vignette and function examples.
Usage
data(reg.bfa_sp)
Format
The data object reg.bfa_sp is a spBFA data object that is the result of implementing the MCMC code in the vignette.
It is presented here because the run-time would be unecessarily long when compiling the R package.