Tidy topic models fit by the stm package. The arguments and return values
are similar to lda_tidiers.
# S3 method for STM tidy(x, matrix = c("beta", "gamma", "theta"), log = FALSE, document_names = NULL, ...) # S3 method for STM augment(x, data, ...) # S3 method for STM glance(x, ...)
| x | An STM fitted model object, created by |
|---|---|
| matrix | Whether to tidy the beta (per-term-per-topic, default) or gamma/theta (per-document-per-topic) matrix. The stm package calls this the theta matrix, but other topic modeling packages call this gamma. |
| log | Whether beta/gamma/theta should be on a log scale, default FALSE |
| document_names | Optional vector of document names for use with per-document-per-topic tidying |
| ... | Extra arguments, not used |
| data | For |
tidy returns a tidied version of either the beta or gamma matrix.
augment must be provided a data argument, either a
dfm or a table containing one row per original
document-term pair, such as is returned by tdm_tidiers, containing
columns document and term. It returns that same data as a table
with an additional column .topic with the topic assignment for that
document-term combination.
glance always returns a one-row table, with columns
Number of topics in the model
Number of documents in the model
Number of terms in the model
Number of iterations used
If an LDA model, the parameter of the Dirichlet distribution for topics over documents
If matrix == "beta" (default), returns a table with one row per topic and term,
with columns
Topic, as an integer
Term
Probability of a term generated from a topic according to the structural topic model
If matrix == "gamma", returns a table with one row per topic and document,
with columns
Topic, as an integer
Document name (if given in vector of document_names) or
ID as an integer
Probability of topic given document
# NOT RUN { if (requireNamespace("stm", quietly = TRUE) && requireNamespace("quanteda", quietly = TRUE)) { library(dplyr) library(ggplot2) library(stm) library(quanteda) inaug <- dfm(data_corpus_inaugural, remove = stopwords("english"), remove_punct = TRUE) topic_model <- stm(inaug, K = 3, verbose = FALSE, init.type = "Spectral") # tidy the word-topic combinations td_beta <- tidy(topic_model) td_beta # Examine the three topics td_beta %>% group_by(topic) %>% top_n(10, beta) %>% ungroup() %>% ggplot(aes(term, beta)) + geom_col() + facet_wrap(~ topic, scales = "free") + coord_flip() # tidy the document-topic combinations, with optional document names td_gamma <- tidy(topic_model, matrix = "gamma", document_names = rownames(inaug)) td_gamma # find the assignments of each word in each document assignments <- augment(topic_model, inaug) assignments } # }