Functions to add or retrieve corpus summary metadata

add_summary_metadata(x, extended = FALSE, ...)

get_summary_metadata(x, ...)

summarize_texts_extended(x, stop_words = stopwords("en"), n = 100)

Arguments

x

corpus object

...

additional arguments passed to tokens() when computing the summary information

Value

add_summary_metadata() returns a corpus with summary metadata added as a data.frame, with the top-level list element names summary.

get_summary_metadata() returns the summary metadata as a data.frame.

summarize_texts_extended() returns extended summary information.

Details

This is provided so that a corpus object can be stored with summary information to avoid having to compute this every time [summary.corpus()] is called.

So in future calls, if !is.null(meta(x, "summary", type = "system") && !length(list(...)), then summary.corpus() will simply return get_system_meta() rather than compute the summary statistics on the fly, which requires tokenizing the text.

Examples

corp <- corpus(data_char_ukimmig2010) corp <- quanteda:::add_summary_metadata(corp) quanteda:::get_summary_metadata(corp)
#> Corpus consisting of 9 documents, showing 9 documents: #> #> Text Types Tokens Sentences #> BNP 1125 3280 88 #> Coalition 142 260 4 #> Conservative 251 499 15 #> Greens 322 677 21 #> Labour 298 680 29 #> LibDem 251 483 14 #> PC 77 114 5 #> SNP 88 134 4 #> UKIP 346 722 26 #>
if (FALSE) { # using extended summary extended_data <- quanteda:::summarize_texts_extended(data_corpus_inaugural) textplot_wordcloud(extended_data$top_dfm, max_words = 100) library("ggplot2") ggplot(data.frame(all_tokens = extended_data$all_tokens), aes(x = all_tokens)) + geom_histogram(color = "darkblue", fill = "lightblue") + xlab("Total length in tokens") }