texts.RdGet or replace the texts in a corpus, with grouping options.
Works for plain character vectors too, if groups is a factor.
texts(x, groups = NULL, spacer = " ") texts(x) <- value # S3 method for corpus as.character(x, ...)
| x | a corpus or character object |
|---|---|
| groups | either: a character vector containing the names of document variables to be used for grouping; or a factor or object that can be coerced into a factor equal in length or rows to the number of documents. See groups for details. |
| spacer | when concatenating texts by using |
| value | character vector of the new texts |
| ... | unused |
For texts, a character vector of the texts in the corpus.
For texts <-, the corpus with the updated texts.
for texts <-, a corpus with the texts replaced by value
as.character(x) is equivalent to texts(x)
as.character(x) where x is a corpus is equivalent to
calling texts(x)
The groups will be used for concatenating the texts based on shared
values of groups, without any specified order of aggregation.
You are strongly encouraged as a good practice of text analysis
workflow not to modify the substance of the texts in a corpus.
Rather, this sort of processing is better performed through downstream
operations. For instance, do not lowercase the texts in a corpus, or you
will never be able to recover the original case. Rather, apply
tokens_tolower after applying tokens to a
corpus, or use the option tolower = TRUE in dfm.
#> 1789-Washington 1793-Washington 1797-Adams 1801-Jefferson 1805-Jefferson #> 8618 790 13876 10136 12907# grouping on a document variable nchar(texts(corpus_subset(data_corpus_inaugural, Year < 1806), groups = "President"))#> Adams Jefferson Washington #> 13876 23045 9410# grouping a character vector using a factor nchar(data_char_ukimmig2010[1:5])#> BNP Coalition Conservative Greens Labour #> 18567 1471 2692 3841 3854nchar(texts(data_corpus_inaugural[1:5], groups = as.factor(data_corpus_inaugural[1:5, "President"])))#> Adams Jefferson Washington #> 13876 23045 9410BritCorpus <- corpus(c("We must prioritise honour in our neighbourhood.", "Aluminium is a valourous metal.")) texts(BritCorpus) <- stringi::stri_replace_all_regex(texts(BritCorpus), c("ise", "([nlb])our", "nium"), c("ize", "$1or", "num"), vectorize_all = FALSE) texts(BritCorpus)#> text1 #> "We must prioritize honor in our neighborhood." #> text2 #> "Aluminum is a valorous metal."texts(BritCorpus)[2] <- "New text number 2." texts(BritCorpus)#> text1 #> "We must prioritize honor in our neighborhood." #> text2 #> "New text number 2."