Construct a compressed version of a corpus.

corpuszip(x, docnames = NULL, docvars = NULL, text_field = "text",
  metacorpus = NULL, ...)

Arguments

x
a valid corpus source object
docnames
Names to be assigned to the texts, defaults to the names of the character vector (if any), otherwise assigns "text1", "text2", etc.
docvars
A data frame of attributes that is associated with each text.
text_field
the character name or numeric index of the source data.frame indicating the variable to be read in as text, which must be a character vector. All other variables in the data.frame will be imported as docvars. This argument is only used for data.frame objects (including those created by readtext).
metacorpus
a named list containing additional (character) information to be added to the corpus as corpus-level metadata. Special fields recognized in the summary.corpus are:
  • source a description of the source of the texts, used for referencing;
  • citation information on how to cite the corpus; and
  • notes any additional information about who created the text, warnings, to do lists, etc.
...
not used directly

Examples

# create a compressed corpus from texts corpuszip(data_char_inaugural)
#> Corpus consisting of NULL document (compressed 65.8%).
# create a compressed corpus from texts and assign meta-data and document variables cop <- corpus(data_char_ukimmig2010, docvars = data.frame(party = names(data_char_ukimmig2010))) cop_zip <- corpuszip(data_char_ukimmig2010, docvars = data.frame(party = names(data_char_ukimmig2010))) object.size(cop)
#> 45136 bytes
object.size(cop_zip)
#> 21424 bytes