| Type: | Package | 
| Title: | A 'Shiny' Application for Inspecting Structural Topic Models | 
| Version: | 0.4.3 | 
| Date: | 2024-06-21 | 
| URL: | https://github.com/cschwem2er/stminsights | 
| BugReports: | https://github.com/cschwem2er/stminsights/issues | 
| Description: | This app enables interactive validation, interpretation and visualization of structural topic models from the 'stm' package by Roberts and others (2014) <doi:10.1111/ajps.12103>. It also includes helper functions for model diagnostics and extracting data from effect estimates. | 
| Imports: | stm (≥ 1.3.7), tidygraph (≥ 1.3.1), ggraph (≥ 2.2.1), igraph (≥ 2.0.3), ggrepel (≥ 0.9.5), shiny (≥ 1.8.1), shinyBS (≥ 0.6.0), shinydashboard (≥ 0.7.2), shinyjs (≥ 2.1.0), ggplot2 (≥ 3.5.1), purrr (≥ 1.0.2), stringr (≥ 1.5.1), dplyr (≥ 1.1.4), tibble (≥ 3.2.1), DT (≥ 0.33.0), readr (≥ 2.1.5), huge (≥ 1.3.5), stats, scales | 
| Suggests: | quanteda (≥ 4.0.2), knitr, rmarkdown | 
| License: | MIT + file LICENSE | 
| Encoding: | UTF-8 | 
| RoxygenNote: | 7.2.3 | 
| VignetteBuilder: | knitr | 
| NeedsCompilation: | no | 
| Packaged: | 2024-06-21 11:13:08 UTC; kasus | 
| Author: | Carsten Schwemmer | 
| Maintainer: | Carsten Schwemmer <c.schwem2er@gmail.com> | 
| Repository: | CRAN | 
| Date/Publication: | 2024-06-21 12:20:02 UTC | 
computes stm model diagnostics
Description
 get_diag() is a helper function to compute average and median
semanticCoherence and exclusivity for
a number of  stm models. The function does not work for
models with content covariates.
Usage
get_diag(models, outobj)
Arguments
| models | A list of stm models. | 
| outobj | The  | 
Value
Returns model diagnostics in a data frame.
Examples
library(stm)
library(dplyr)
library(ggplot2)
library(quanteda)
# prepare data
data <- corpus(gadarian, text_field = 'open.ended.response')
docvars(data)$text <- as.character(data)
data <- tokens(data, remove_punct = TRUE) |>
  tokens_wordstem() |>
  tokens_remove(stopwords('english')) |> dfm() |>
  dfm_trim(min_termfreq = 2)
out <- convert(data, to = 'stm')
# fit models
gadarian_3 <- stm(documents = out$documents,
                  vocab = out$vocab,
                  data = out$meta,
                  prevalence = ~ treatment + s(pid_rep),
                  K = 3,
                  max.em.its = 1, # reduce computation time for example
                  verbose = FALSE)
gadarian_5 <- stm(documents = out$documents,
                  vocab = out$vocab,
                  data = out$meta,
                  prevalence = ~ treatment + s(pid_rep),
                  K = 5,
                  max.em.its = 1, # reduce computation time for example
                  verbose = FALSE)
# get diagnostics
diag <- get_diag(models = list(
                 model_3 = gadarian_3,
                 model_5 = gadarian_5),
                 outobj = out)
## Not run: 
# plot diagnostics
diag |>
ggplot(aes(x = coherence, y = exclusivity, color = statistic))  +
  geom_text(aes(label = name), nudge_x = 5) + geom_point() +
  labs(x = 'Semantic Coherence', y = 'Exclusivity') + theme_light()
## End(Not run)
extract stm effect estimates
Description
 get_effects() is a helper function to store effect estimates from
stm in a data frame.
Usage
get_effects(
  estimates,
  variable,
  type,
  ci = 0.95,
  moderator = NULL,
  modval = NULL,
  cov_val1 = NULL,
  cov_val2 = NULL
)
Arguments
| estimates | The object containing estimates calculated with
 | 
| variable | The variable for which estimates should be extracted. | 
| type | The estimate type. Must be either  | 
| ci | The confidence interval for uncertainty estimates.
Defaults to   | 
| moderator | The moderator variable in case you want to include an interaction effect. | 
| modval | The value of the moderator variable for an interaction effect. See examples for combining data for multiple values. | 
| cov_val1 | The first value of a covariate for type  | 
| cov_val2 | The second value of a covariate for type  | 
Value
Returns effect estimates in a tidy data frame.
Examples
library(stm)
library(dplyr)
library(ggplot2)
# store effects
prep <- estimateEffect(1:3 ~ treatment + pid_rep, gadarianFit, gadarian)
effects <- get_effects(estimates = prep,
                      variable = 'treatment',
                      type = 'pointestimate')
# plot effects
effects |> filter(topic == 3) |>
ggplot(aes(x = value, y = proportion)) +
 geom_errorbar(aes(ymin = lower, ymax = upper), width = 0.1, size = 1) +
 geom_point(size = 3) +
 coord_flip() + theme_light() + labs(x = 'Treatment', y = 'Topic Proportion')
# combine estimates for interaction effects
prep_int <- estimateEffect(1:3 ~ treatment * s(pid_rep),
 gadarianFit, gadarian)
effects_int <- get_effects(estimates = prep_int,
                          variable = 'pid_rep',
                          type = 'continuous',
                          moderator = 'treatment',
                          modval = 1) |>
 bind_rows(
   get_effects(estimates = prep_int,
               variable = 'pid_rep',
               type = 'continuous',
               moderator = 'treatment',
               modval = 0)
 )
# plot interaction effects
effects_int  |>  filter(topic == 2) |>
 mutate(moderator = as.factor(moderator)) |>
 ggplot(aes(x = value, y = proportion, color = moderator,
 group = moderator, fill = moderator)) +
 geom_line() +
 geom_ribbon(aes(ymin = lower, ymax = upper), alpha = 0.2)  +
 theme_light() + labs(x = 'PID Rep.', y = 'Topic Proportion',
 color = 'Treatment', group = 'Treatment', fill = 'Treatment')
extract topic correlation network
Description
 get_network() is a helper function to extract topic correlation networks
as tidygraph objects and add labels and topic proportions.
Arguments
| model | The stm model for computing the correlation network. | 
| method | The method for determining edges. Can be either   | 
| cutoff | The correlation cutoff criterion for  | 
| labels | An optional vector of topic labels. Must include a label for each topic of the model. | 
| cutiso | Remove isolated notes without any edges from the network. Defaults to  | 
Value
Returns tidygraph network of topic correlations.
Examples
library(stm)
library(ggraph)
library(quanteda)
# prepare data
data <- corpus(gadarian, text_field = 'open.ended.response')
docvars(data)$text <- as.character(data)
data <- tokens(data, remove_punct = TRUE) |>
  tokens_wordstem() |>
  tokens_remove(stopwords('english')) |> dfm() |>
  dfm_trim(min_termfreq = 2)
out <- convert(data, to = 'stm')
# fit model
gadarian_10 <- stm(documents = out$documents,
                   vocab = out$vocab,
                   data = out$meta,
                   prevalence = ~ treatment + s(pid_rep),
                   K = 10,
                   max.em.its = 1, # reduce computation time for example
                   verbose = FALSE)
## Not run: 
# extract network
stm_corrs <- get_network(model = gadarian_10,
                         method = 'simple',
                         labels = paste('Topic', 1:10),
                         cutoff = 0.001,
                         cutiso = TRUE)
# plot network
ggraph(stm_corrs, layout = 'auto') +
  geom_edge_link(
    aes(edge_width = weight),
    label_colour = '#fc8d62',
    edge_colour = '#377eb8') +
  geom_node_point(size = 4, colour = 'black')  +
  geom_node_label(
    aes(label = name, size = props),
    colour = 'black',  repel = TRUE, alpha = 0.85) +
  scale_size(range = c(2, 10), labels = scales::percent) +
  labs(size = 'Topic Proportion',  edge_width = 'Topic Correlation') +
  scale_edge_width(range = c(1, 3)) +
  theme_graph()
## End(Not run)
launch the stminsights shiny app
Description
run_stminsights launches the app to analyze Structural Topic models.
It requires a .RData file with stm objects as illustrated in the example below.
Usage
run_stminsights(use_browser = TRUE)
Arguments
| use_browser | Choose whether you want to launch the shiny app in your browser.
Defaults to  | 
Examples
 ## Not run: 
library(stm)
library(quanteda)
# prepare data
data <- corpus(gadarian, text_field = 'open.ended.response')
docvars(data)$text <- as.character(data)
data <- tokens(data, remove_punct = TRUE) |>
  tokens_wordstem() |>
  tokens_remove(stopwords('english')) |> dfm() |>
  dfm_trim(min_termfreq = 2)
out <- convert(data, to = 'stm')
# fit models and effect estimates
gadarian_3 <- stm(documents = out$documents,
                  vocab = out$vocab,
                  data = out$meta,
                  prevalence = ~ treatment + s(pid_rep),
                  K = 3,
                  max.em.its = 1, # reduce computation time for example
                  verbose = FALSE)
prep_3 <- estimateEffect(1:3 ~ treatment + s(pid_rep), gadarian_3,
                         meta = out$meta)
gadarian_5 <- stm(documents = out$documents,
                  vocab = out$vocab,
                  data = out$meta,
                  prevalence = ~ treatment + s(pid_rep),
                  K = 5,
                  max.em.its = 1, # reduce computation time for example
                  verbose = FALSE)
prep_5 <- estimateEffect(1:5 ~ treatment + s(pid_rep), gadarian_5,
                         meta = out$meta)
# save objects in .RData file
save.image(paste0(tempdir(), '/stm_gadarian.RData'))
# launch the app
if(interactive()){
  run_stminsights()
}
## End(Not run)