In this vignette we fit a Bayesian latent class analysis model with a prior on the number of components (classes) \(K\) to the fear data set. This multivariate data set contains three categorical variables with 3, 3, and 4 levels and latent class analysis was previously performed to extract two groups (Stern et al. 1994). We use the prior specification and the telescoping sampler for performing MCMC sampling as proposed in Frühwirth-Schnatter, Malsiner-Walli, and Grün (2021).
First, we load the package.
library("telescope")
We create the fear data set and define the profile of success
probabilities pi_stern
for the two groups identified by
Stern et al. (1994) for benchmarking
purposes.
<- c(5, 15, 3, 2, 4, 4, 3, 1, 1, 2, 4, 2, 0, 2, 0, 0, 1, 3, 2, 1,
freq 2, 1, 3, 3, 2, 4, 1, 0, 0, 4, 1, 3, 2, 2, 7, 3)
<- cbind(F = rep(rep(1:3, each = 4), 3),
pattern C = rep(1:3, each = 3 * 4),
M = rep(1:4, 9))
<- pattern[rep(seq_along(freq), freq),]
fear <- matrix(c(0.74, 0.26, 0.0, 0.71, 0.08, 0.21, 0.22, 0.6, 0.12, 0.06,
pi_stern 0.00, 0.32, 0.68, 0.28, 0.31, 0.41, 0.14, 0.19, 0.40, 0.27),
ncol = 10, byrow = TRUE)
We store the dimension of the data set.
<- nrow(fear)
N <- ncol(fear) r
We visualize the data set.
plotBubble(fear)
For multivariate categorical observations \(\mathbf{y}_1,\ldots,\mathbf{y}_N\) the following model with hierachical prior structure is assumed:
\[\begin{aligned} \mathbf{y}_i \sim \sum_{k=1}^K \eta_k \prod_{j=1}^r \prod _{d=1}^{D_j} \pi_{k,jd}^{I\{y_{ij}=d\}}, & \qquad \text{ where } \pi_{k,jd} = Pr(Y_{ij}=d|S_i=k)\\ K \sim p(K)&\\ \boldsymbol{\eta} \sim Dir(e_0)&, \qquad \text{with } e_0 \text{ fixed, } e_0\sim p(e_0) \text { or } e_0=\frac{\alpha}{K}, \text{ with } \alpha \text{ fixed or } \alpha \sim p(\alpha),\\ \boldsymbol{\pi}_{k,j} \sim Dir(a_0).& \end{aligned}\]For MCMC sampling we need to specify Mmax
, the maximum
number of iterations, thin
, the thinning imposed to reduce
auto-correlation in the chain by only recording every
thin
ed observation, and burnin
, the number of
burn-in iterations not recorded.
<- 5000
Mmax <- 10
thin <- 1000 burnin
The specifications of Mmax
and thin
imply
M
, the number of recorded observations.
<- Mmax/thin M
We specify with Kmax
the maximum number of components
possible during sampling. Kinit
denotes the initial number
of filled components.
<- 50
Kmax <- 10 Kinit
We fit a dynamic specification on the weights.
<- "MixDynamic" G
For a static specific one would need to use
"MixStatic"
.
For the dynamic setup, we specify the F- distribution \(F(6,3)\) as the prior on
alpha
.
<- priorOnAlpha_spec("F_6_3") priorOnAlpha
We need to select the prior on K
. We use the prior \(K-1 \sim BNB(1, 4, 3)\) as suggested in
Frühwirth-Schnatter, Malsiner-Walli, and Grün
(2021) for a model-based clustering context.
<- priorOnK_spec("BNB_143") priorOnK
Now, we specify the component-specific priors. We use a symmetric Dirichlet prior for the success probabilities \(\boldsymbol{\pi}_{k,j} \sim \mathcal{Dir}(a_0)\) for each variable with \(a_0=1\) inducing a uniform prior.
<- apply(fear, 2, max)
cat <- rep(1, sum(cat)) a0
We obtain an initial partition S_0
and initial component
weights eta_0
using either kmodes()
or
kmeans()
.
set.seed(1234)
if (requireNamespace("klaR", quietly = TRUE)) {
<- klaR::kmodes(data = fear, modes = Kinit,
cl_y iter.max = 20, weighted = FALSE)
else {
} <- kmeans(fear, centers = Kinit, iter.max = 20)
cl_y
}<- cl_y$cluster
S_0 <- cl_y$size/N eta_0
We determine initial values pi_0
for the success
probabilities based on the initial partition.
<- do.call("cbind", lapply(1:r, function(j) {
pi_0 prop.table(table(S_0, fear[, j]), 1)
}))round(pi_0, 2)
## 1 2 3 1 2 3 1 2 3 4
## 1 0.00 0.00 1.00 0.00 1.00 0.00 0.40 0.20 0.00 0.40
## 2 0.00 0.00 1.00 0.00 0.15 0.85 0.15 0.08 0.69 0.08
## 3 0.00 0.33 0.67 0.89 0.11 0.00 0.00 0.11 0.89 0.00
## 4 0.70 0.20 0.10 0.00 0.00 1.00 0.20 0.70 0.10 0.00
## 5 0.00 0.86 0.14 0.86 0.14 0.00 0.86 0.14 0.00 0.00
## 6 0.29 0.14 0.57 0.86 0.14 0.00 0.00 0.14 0.00 0.86
## 7 0.00 1.00 0.00 0.17 0.83 0.00 0.00 0.67 0.33 0.00
## 8 1.00 0.00 0.00 1.00 0.00 0.00 0.71 0.00 0.29 0.00
## 9 0.00 0.78 0.22 0.00 0.11 0.89 0.00 0.22 0.11 0.67
## 10 0.90 0.10 0.00 0.90 0.10 0.00 0.00 0.95 0.05 0.00
Using this prior specification as well as initialization and MCMC settings, we draw samples from the posterior using the telescoping sampler.
The first argument of the sampling function is the data followed by
the initial partition and the initial parameter values for
component-specific success probabilities and weights. The next argument
corresponds to the hyperparameter of the prior setup (a0
).
Then the setting for the MCMC sampling is specified using
M
, burnin
, thin
and
Kmax
. Finally the prior specification for the weights and
the prior on the number of components are given (G
,
priorOnK
, priorOnAlpha
).
<- sampleLCA(
result
fear, S_0, pi_0, eta_0, a0,
M, burnin, thin, Kmax, G, priorOnK, priorOnAlpha)
The sampling function returns a named list where the sampled
parameters and latent variables are contained. The list includes the
component-specific success probabilities Pi
, the weights
Eta
, the assignments S
, the number of
observations Nk
assigned to components, the number of
components K
, the number of filled components
Kplus
, parameter values corresponding to the mode of the
nonnormalized posterior nonnormpost_mode
, the acceptance
rate in the Metropolis-Hastings step when sampling \(\alpha\) or \(e_0\), \(\alpha\) and \(e_0\). These values can be extracted for
post-processing.
<- result$Eta
Eta <- result$Pi
Pi <- result$Nk
Nk <- result$S
S <- result$K
K <- result$Kplus
Kplus <- result$nonnormpost_mode
nonnormpost_mode <- result$e0
e0 <- result$alpha
alpha <- result$acc acc
We inspect the acceptance rate when sampling \(\alpha\) and the trace plot of the sampled
alpha
:
<- sum(acc)/M
acc acc
## [1] 0.406
plot(1:length(alpha), alpha, type = "l",
ylim = c(0, max(alpha)),
xlab = "iterations", ylab = expression(alpha))
We further assess the distribution of the sampled \(\alpha\) by inspecting a histogram as well determining the mean and some quantiles.
hist(alpha, freq = FALSE, breaks = 50)
mean(alpha)
## [1] 5.109542
quantile(alpha, probs = c(0.25, 0.5, 0.75))
## 25% 50% 75%
## 1.109833 2.491861 4.706211
We also inspect the trace plot of the induced hyperparameter \(e_0\):
plot(1:length(e0), e0, type = "l",
ylim = c(0, max(e0)),
xlab = "iterations", ylab = expression(e["0"]))
In addition we plot the histogram of the induced \(e_0\) as well as their mean and some quantiles.
hist(e0, freq = FALSE, breaks = 50)
mean(e0)
## [1] 2.000098
quantile(e0, probs = c(0.25, 0.5, 0.75))
## 25% 50% 75%
## 0.3382706 0.8069247 1.7778566
To further assess convergence, we also inspect the trace plots for the number of components \(K\) and the number of filled components \(K_+\).
plot(1:length(K), K, type = "l", ylim = c(0, 30),
xlab = "iteration", main = "", ylab = "number",
lwd = 0.5, col = "grey")
points(1:length(K), Kplus, type = "l", col = "red3",
lwd = 2, lty = 1)
legend("topright", legend = c("K", "K+"), col = c("grey", "red3"),
lwd = 2)
We determine the posterior distribution of the number of filled components \(K_+\), approximated using the telescoping sampler. We visualize the distribution of \(K_+\) using a barplot.
<- rowSums(Nk != 0, na.rm = TRUE)
Kplus <- tabulate(Kplus, nbins = max(Kplus))/M
p_Kplus barplot(p_Kplus/sum(p_Kplus), xlab = expression(K["+"]), names = 1:length(p_Kplus),
col = "red3", ylab = expression("p(" ~ K["+"] ~ "|" ~ bold(y) ~ ")"))
The distribution of \(K_+\) is also characterized using the 1st and 3rd quartile as well as the median.
quantile(Kplus, probs = c(0.25, 0.5, 0.75))
## 25% 50% 75%
## 2.0 2.5 3.0
We obtain a point estimate for \(K_+\) by taking the mode and determine the number of MCMC draws where exactly \(K_+\) components were filled.
<- which.max(p_Kplus)
Kplus_hat Kplus_hat
## [1] 2
<- sum(Kplus == Kplus_hat)
M0 M0
## [1] 250
We also determine the posterior distribution of the number of components \(K\) directly drawn using the telescoping sampler.
<- tabulate(K, nbins = max(K))/M
p_K quantile(K, probs = c(0.25, 0.5, 0.75))
## 25% 50% 75%
## 2 3 4
barplot(p_K/sum(p_K), names = 1:length(p_K), xlab = "K",
ylab = expression("p(" ~ K ~ "|" ~ bold(y) ~ ")"))
which.max(tabulate(K, nbins = max(K)))
## [1] 2
First we select those draws where the number of filled groups was exactly \(\hat{K}_+\):
<- Kplus == Kplus_hat
index is.na(Nk)] <- 0
Nk[<- (Nk[Kplus == Kplus_hat, ] > 0) Nk_Kplus
In the following we extract the cluster success probabilities, data cluster sizes and cluster assignments for the draws where exactly \(\hat{K}_+\) components were filled.
<- Pi[index, , ]
Pi_inter <- array(0, dim = c(M0, sum(cat), Kplus_hat))
Pi_Kplus for (j in 1:sum(cat)) {
<- Pi_inter[, , j][Nk_Kplus]
Pi_Kplus[, j, ]
}
<- Eta[index, ]
Eta_inter <- matrix(Eta_inter[Nk_Kplus], ncol = Kplus_hat)
Eta_Kplus <- sweep(Eta_Kplus, 1, rowSums(Eta_Kplus), "/")
Eta_Kplus
<- which(index)
w <- matrix(0, M0, N)
S_Kplus for (i in seq_along(w)) {
<- w[i]
m <- rep(0, Kmax)
perm_S != 0] <- 1:Kplus_hat
perm_S[Nk[m, ] <- perm_S[S[m, ]]
S_Kplus[i, ] }
For model identification, we cluster the draws of the success probabilities where exactly \(\hat{K}_+\) components were filled in the point process representation using \(k\)-means clustering.
<- nonnormpost_mode[[Kplus_hat]]$pi
Func_init <- identifyMixture(
identified_Kplus Pi_Kplus, Pi_Kplus, Eta_Kplus, S_Kplus, Func_init)
We inspect the non-permutation rate to assess how well separated the data clusters are and thus how easily one can obtain a suitable relabeling of the draws. Low values of the non-permutation rate, i.e., close to zero, indicate that the solution can be easily identified pointing to a good clustering solution being obtained.
$non_perm_rate identified_Kplus
## [1] 0
The relabeled draws are also returned which can be used to determine posterior mean values for data cluster specific parameters. The estimated success probabilities are compared to the estimates reported in Stern et al. (1994).
colMeans(identified_Kplus$Eta)
## [1] 0.5377879 0.4622121
<- colMeans(identified_Kplus$Mu)
Pi_identified round(t(Pi_identified), digits = 2)
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,] 0.62 0.28 0.10 0.67 0.11 0.22 0.22 0.57 0.13 0.08
## [2,] 0.07 0.31 0.63 0.27 0.31 0.42 0.14 0.16 0.41 0.28
pi_stern
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,] 0.74 0.26 0.00 0.71 0.08 0.21 0.22 0.60 0.12 0.06
## [2,] 0.00 0.32 0.68 0.28 0.31 0.41 0.14 0.19 0.40 0.27
The final partition is obtained by assigning each observation to the group where it was assigned most frequently during sampling.
<- apply(identified_Kplus$S, 2,
z_sp function(x) which.max(tabulate(x, Kplus_hat)))
table(z_sp)
## z_sp
## 1 2
## 51 42