R/corpus-addsummary-metadata.R
summary_metadata.Rd
Functions to add or retrieve corpus summary metadata
add_summary_metadata(x, extended = FALSE, ...) get_summary_metadata(x, ...) summarize_texts_extended(x, stop_words = stopwords("en"), n = 100)
x | corpus object |
---|---|
... | additional arguments passed to |
add_summary_metadata()
returns a corpus with summary metadata added
as a data.frame, with the top-level list element names summary
.
get_summary_metadata()
returns the summary metadata as a data.frame.
summarize_texts_extended()
returns extended summary information.
This is provided so that a corpus object can be stored with
summary information to avoid having to compute this every time
[summary.corpus()]
is called.
So in future calls, if !is.null(meta(x, "summary", type = "system") && !length(list(...))
, then summary.corpus()
will simply return
get_system_meta()
rather than compute the summary statistics on the fly,
which requires tokenizing the text.
corp <- corpus(data_char_ukimmig2010) corp <- quanteda:::add_summary_metadata(corp) quanteda:::get_summary_metadata(corp)#> Corpus consisting of 9 documents, showing 9 documents: #> #> Text Types Tokens Sentences #> BNP 1125 3280 88 #> Coalition 142 260 4 #> Conservative 251 499 15 #> Greens 322 677 21 #> Labour 298 680 29 #> LibDem 251 483 14 #> PC 77 114 5 #> SNP 88 134 4 #> UKIP 346 722 26 #>if (FALSE) { # using extended summary extended_data <- quanteda:::summarize_texts_extended(data_corpus_inaugural) textplot_wordcloud(extended_data$top_dfm, max_words = 100) library("ggplot2") ggplot(data.frame(all_tokens = extended_data$all_tokens), aes(x = all_tokens)) + geom_histogram(color = "darkblue", fill = "lightblue") + xlab("Total length in tokens") }